Sen, Liang; Sen, Yang; Dayang, Liang; Jiechao, Ma; Yuan, Tian; Jing, Zhao; Xu, Zhang; Ying, Xu; Yan, Wang A novel matched-pairs feature selection method considering with tumor purity for differential gene expression analyses. (English) Zbl 1425.92081 Math. Biosci. 311, 39-48 (2019). Summary: Tissue-based gene expression data analyses, while most powerful, represent a significantly more challenging problem compared to cell-based gene expression data analyses, even for the simplest differential gene expression analyses. The result in determining if a gene is differentially expressed in tumor vs. non-tumorous control tissues does not only depend on the two expression values but also on the percentage of the tissue cells being tumor cells, i.e., the tumor purity. We developed a novel matched-pairs feature selection method, which takes into full consideration of the tumor purity when deciding if a gene is differentially expressed in tumor vs. control experiments, which is simple, effective, and accurate. To evaluate the validity and performance of the method, we have compared it with four published methods using both simulated datasets and actual cancer tissue datasets and found that our method achieved better performance with higher sensitivity and specificity than the other methods. Our method was the a matched-pairs feature selection method on gene expression analysis under matched case-control design which takes into consideration the tumor purity information, which can set a foundation for further development of other gene expression analysis needs. Cited in 2 Documents MSC: 92C40 Biochemistry, molecular biology 62P10 Applications of statistics to biology and medical sciences; meta analysis Keywords:feature selection; tumor purity; test statistic; gene expression analyses; matched case-control design Software:Bioconductor; contamDE; PurityEst; QUBIC; InfiniumPurify; MethylPurify; UNDO; Accurity; THetA PDFBibTeX XMLCite \textit{L. Sen} et al., Math. Biosci. 311, 39--48 (2019; Zbl 1425.92081) Full Text: DOI References: [1] Zhang, C.; Cheng, W.; Ren, X.; Wang, Z.; Liu, X.; Li, G.; Han, S.; Jiang, T.; Wu, A., Tumor purity as an underlying key factor in glioma, Clin. Cancer Res., 23, 6279-6291 (2017) [2] M. Yihao, F. Qingyang, Z. Peng, Y. Liangliang, L. Tianyu, X. Yuqiu, Z. Dexiang, C. Wenju, J. Meiling, T. Yongjiu, R. Li, W. Ye, H. Guodong, X. Jianmin, Tumour purity as a prognostic factor in colon cancer, Biorxiv. (2018) 263723. doi:10.1101/263723.; M. Yihao, F. Qingyang, Z. Peng, Y. Liangliang, L. Tianyu, X. Yuqiu, Z. Dexiang, C. Wenju, J. Meiling, T. Yongjiu, R. Li, W. Ye, H. Guodong, X. Jianmin, Tumour purity as a prognostic factor in colon cancer, Biorxiv. (2018) 263723. doi:10.1101/263723. [3] Aran, D.; Sirota, M.; Butte, A. J., Systematic pan-cancer analysis of tumour purity, Nat. Commun., 6, 8971 (2015) [4] Yoshihara, K.; Shahmoradgoli, M.; Martínez, E.; Vegesna, R.; Kim, H.; Torres-Garcia, W.; Treviño, V.; Shen, H.; Laird, P. W.; Levine, D. A.; Carter, S. L.; Getz, G.; Stemke-Hale, K.; Mills, G. B.; Verhaak, R., Inferring tumour purity and stromal and immune cell admixture from expression data, Nat. Commun., 4, 2612 (2013) [5] Carter, S. L.; Cibulskis, K.; Helman, E.; McKenna, A.; Shen, H.; Zack, T.; Laird, P. W.; Onofrio, R. C.; Winckler, W.; Weir, B. A., Absolute quantification of somatic DNA alterations in human cancer, Nat. Biotechnol., 30, 413-421 (2012) [6] Oesper, L.; Mahmoody, A.; Raphael, B. J., THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data, Genome Biol., 14, R80 (2013) [7] Chen, H.; Bell, J. M.; Zavala, N. A.; Ji, H. P.; Zhang, N. R., Allele-specific copy number profiling by next-generation DNA sequencing, Nucleic Acids Res., 43, e23 (2015) [8] Su, X.; Zhang, L.; Zhang, J.; Meric-Bernstam, F.; Weinstein, J. N., PurityEst: estimating purity of human tumor samples using next-generation sequencing data, Bioinformatics, 28, 2265-2266 (2012) [9] Luo, Z.; Fan, X.; Su, Y.; Huang, Y. S., Accurity: accurate tumor purity and ploidy inference from tumor-normal WGS data by jointly modelling somatic copy number alterations and heterozygous germline single-nucleotide-variants, Bioinformatics, 34, 2004-2011 (2018) [10] Qin, Y.; Feng, H.; Chen, M.; Wu, H.; Zheng, X., InfiniumPurify: an R package for estimating and accounting for tumor purity in cancer methylation research, Genes Dis., 5, 43-45 (2018) [11] Zheng, X.; Zhao, Q.; Wu, H.-J.; Li, W.; Wang, H.; Meyer, C. A.; Qin, Q.; Xu, H.; Zang, C.; Jiang, P.; Li, F.; Hou, Y.; He, J.; Wang, J.; Wang, J.; Zhang, P.; Zhang, Y.; Liu, X., MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes, Genome Biol., 15, 419 (2014) [12] Zheng, X.; Zhang, N.; Wu, H.-J.; Wu, H., Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies, Genome Biol., 18, 17 (2017) [13] Zhang, N.; Wu, H.-J.; Zhang, W.; Wang, J.; Wu, H.; Zheng, X., Predicting tumor purity from methylation microarray data, Bioinformatics, 31, 3401-3405 (2015) [14] Wang, F.; Zhang, N.; Wang, J.; Wu, H.; Zheng, X., Tumor purity and differential methylation in cancer epigenomics, Brief. Funct. Genom., elw016 (2016) [15] Liang, S.; Ma, A.; Yang, S.; Wang, Y.; Ma, Q., A Review of matched-pairs feature selection methods for gene expression data analysis, Comput. Struct. Biotechnol. J., 16, 88-97 (2018) [16] Cao, Z.; Wang, Y.; Sun, Y.; Du, W.; Liang, Y., A novel filter feature selection method for paired microarray expression data analysis, Int. J. Data Min. Bioinform., 12, 363-386 (2015) [17] Tan, Q.; Thomassen, M.; Kruse, T. A., Feature selection for predicting tumor metastases in microarray experiments using paired design, Cancer Inform., 3, 213-218 (2007) [18] Asafu-Adjei, J.; Tadesse, M. G.; Coull, B.; Balasubramanian, R.; Lev, M.; Schwamm, L.; Betensky, R., Bayesian Variable selection methods for matched case-control studies, The International Journal of Biostatistics, 13, Article 20160043 pp. (2017) [19] Balasubramanian, R.; Houseman, A. E.; Coull, B. A.; Lev, M. H.; Schwamm, L. H.; Betensky, R. A., Variable importance in matched case-control studies in settings of high dimensional data, J. R. Stat. Soc., 63, 639-655 (2014) [20] Adewale, A. J.; Dinu, I.; Yasui, Y., Boosting for correlated binary classification, J. Comput. Graph. Stat., 19, 140-153 (2010) [21] Cui, X.; Churchill, G. A., Statistical tests for differential expression in cDNA microarray experiments, Genome Biol., 4, 210 (2003) [22] Wang, N.; Gong, T.; Clarke, R.; Chen, L.; Shih, I.-M.; Zhang, Z.; Levine, D. A.; Xuan, J.; Wang, Y., UNDO: a bioconductor R package for unsupervised deconvolution of mixed gene expressions in tumor samples, Bioinformatics, 31, 137-139 (2015) [23] Shen, Q.; Hu, J.; Jiang, N.; Hu, X.; Luo, Z.; Zhang, H., contamDE: differential expression analysis of RNA-seq data for contaminated tumor samples, Bioinformatics, 32, 705-712 (2016) [24] Zhang, W.; Long, H.; He, B.; Yang, J., DECtp: calling differential gene expression between cancer and normal samples by integrating tumor purity information, Front. Genet., 9, 321 (2018) [25] Network, T.; Weinstein, J. N.; Collisson, E. A.; Mills, G. B.; Shaw, K. R.; Ozenberger, B. A.; Ellrott, K.; Shmulevich, I.; Sander, C.; uart, J., The cancer genome atlas pan-cancer analysis project, Nat Genet., 45, ng.2764 (2013) [26] Tomczak, K.; Czerwińska, P.; Wiznerowicz, M., The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge, Contemp. Oncol., 19, A68-A77 (2015) [27] Expansion of the Gene Ontology knowledgebase and resources, Nucleic Acids Res., 45, D331-D338 (2017) [28] Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, M. J.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G., Gene ontology: tool for the unification of biology, Nat. Genet., 25, 25-29 (2000) [29] Ferreira, J.; Zwinderman, A., On the Benjamini-Hochberg method, Ann. Stat., 34, 1827-1849 (2006) · Zbl 1246.62170 [30] Connolly, M. A.; Liang, K.-Y., Conditional logistic regression models for correlated binary data, Biometrika, 75, 501-506 (1988) · Zbl 0651.62062 [31] Yuan, F.; Lu, L.; Zhang, Y.; Wang, S.; Cai, Y.-D., Data mining of the cancer-related lncRNAs GO terms and KEGG pathways by using mRMR method, Math. Biosci., 304, 1-8 (2018) · Zbl 1409.92092 [32] Bhattacharjee, A.; Vishwakarma, G. K.; Thomas, A., Bayesian state-space modeling in gene expression data analysis: an application with biomarker prediction, Math. Biosci., 305, 96-101 (2018) · Zbl 1409.92084 [33] Niu, Y.-W.; Liu, H.; Wang, G.-H.; Yan, G.-Y., Maximal entropy random walk on heterogenous network for MIRNA-disease Association prediction, Math. Biosci., 306, 1-9 (2018) · Zbl 1409.92118 [34] Petralia, F.; Wang, L.; Peng, J.; Yan, A.; Zhu, J.; Wang, P., A new method for constructing tumor specific gene co-expression networks based on samples with tumor purity heterogeneity, Bioinformatics, 34, i528-i536 (2018) [35] Zhang, Y.; Xie, J.; Yang, J.; Fennell, A.; Zhang, C.; Ma, Q., QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data, Bioinformatics, 33, btw635 (2016) [36] Li, G.; Ma, Q.; Tang, H.; Paterson, A. H.; Xu, Y., QUBIC: a qualitative biclustering algorithm for analyses of gene expression data, Nucleic Acids Res., 37, e101 (2009) [37] Zhou, F.; Ma, Q.; Li, G.; Xu, Y., QServer: a biclustering server for prediction and assessment of co-expressed gene clusters, PLoS One, 7, e32660 (2012) [38] J. Xie, A. Ma, Y. Zhang, B. Liu, C. Wang, S. Cao, C. Zhang, Q. Ma, QUBIC2: a novel biclustering algorithm for large-scale bulk RNA-sequencing and single-cell RNA-sequencing data analysis, Biorxiv. (2018) 409961. doi:10.1101/409961.; J. Xie, A. Ma, Y. Zhang, B. Liu, C. Wang, S. Cao, C. Zhang, Q. Ma, QUBIC2: a novel biclustering algorithm for large-scale bulk RNA-sequencing and single-cell RNA-sequencing data analysis, Biorxiv. (2018) 409961. doi:10.1101/409961. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.