Common miRNA Markers in Adenocarcinomas
Posted on Nov 06, 2025, 08:30 PM EDT | By Double-Strand LLC Research Team
Adenocarcinoma-Associated miRNA Biomarkers
The latest human genome annotation from the GENCODE Consortium (Release 49, September 2025) reports approximately 79,000 genes, of which about one quarter encode proteins. During development and throughout adult life, each cell type expresses a specific subset of this total gene repertoire. Conceptually, a cell type could be represented as a three-column dataset: the first column lists the gene identifiers, the second represents each gene’s average transcript abundance, and the third its expression variance under normal physiological conditions. The genome thus provides the regulatory information constraining which genes can be transcriptionally active in any given cellular context.
Somatic genetic alterations—such as those decreasing the activity of tumor suppressor genes or increasing oncogene expression—can disrupt these regulatory constraints. Under permissive conditions, cells harboring such alterations may acquire neoplastic properties, leading to uncontrolled proliferation. These transformed cells become transcriptionally reprogrammed, expressing a gene set distinct from that of their surrounding normal counterparts.
It has been observed that adenocarcinomas from different organs often share a common transcriptional signature—a set of genes that are silent or expressed at much lower levels in the corresponding normal tissues. This convergence suggests that tumor cells, irrespective of their tissue of origin (e.g., breast, lung, colon, or stomach), activate core molecular programs required for sustained growth and survival. Examples of such commonly overexpressed genes include BIRC5 (Survivin), encoding an anti-apoptotic protein highly expressed in tumors but largely absent from most differentiated tissues, as well as MKI67, CDK1, and TPX2, whose transcripts—though also found in normal proliferative compartments such as intestinal stem cells—are markedly abundant across adenocarcinomas.
Candidate pan-adenocarcinomas miRNA
Among adenocarcinomas, certain non–protein-coding genes, including specific microRNAs, may remain transcriptionally silent in their corresponding normal tissues yet become upregulated in tumor cells.
To explore such tumor-specific transcriptional activation, the
National Cancer Institute’s Genomic Data Commons (GDC) was queried for curated The Cancer Genome Atlas (TCGA) datasets, including TCGA-LUAD (lung adenocarcinoma), TCGA-STAD (stomach adenocarcinoma), TCGA-PRAD (prostate adenocarcinoma), TCGA-BRCA (breast adenocarcinoma), and TCGA-UCEC (endometrial carcinoma).
These datasets collectively represent a broad range of epithelial adenocarcinomas and provide a comprehensive resource for identifying microRNAs that are silent in normal cognate tissues but aberrantly expressed in tumor cells.
The R Bioconductor package
GenomicDataCommons was used to query the NCI Genomic Data Commons (GDC) for the aforementioned TCGA projects and their corresponding miRNA-Seq datasets.
Subsequently, the GDC Data Transfer Tool (gdc-client) was employed to download the mirbase21.mirnas.quantification files onto a Google Cloud Platform (GCP) instance running Debian Linux.
This setup facilitated the automated retrieval and organization of miRNA quantification data across multiple adenocarcinoma cohorts (TCGA-LUAD, TCGA-STAD, TCGA-PRAD, TCGA-BRCA, and TCGA-UCEC) for downstream analysis.
Finally, the normalized read counts (expressed as reads per million miRNA-mapped reads, RPM) for 1,870 mature miRNAs were processed through a workflow integrating feature selection and supervised learning. A preliminary feature selection step was combined with the
Bioconductor DESeq2 package’s DESeq() function to identify miRNAs differentially expressed between tumor and normal samples.
The selected features were then used to fit a model on the binary dependent variable representing sample type (Primary Tumor vs Solid Tissue Normal) using a random forest classifier. Model training and performance assessment were conducted under 10-fold cross-validation, implemented using the
R tidymodels framework to ensure robust and reproducible evaluation.
The results of this analysis indicate that
hsa-miR-96, hsa-miR-183, hsa-miR-182, and ,
hsa-miR-93 represent valid candidates for use as pan-adenocarcinoma miRNA markers, exhibiting consistent upregulation across multiple adenocarcinoma types.
To learn more about our research initiatives or collaborate with us, visit our Pipeline page or Contact Us.