×

Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data. (English) Zbl 1183.68488

Summary: Microarray technology has made it possible to monitor the expression levels of many genes simultaneously across a number of experimental conditions. Fuzzy clustering is an important tool for analyzing microarray gene expression data. In this article, a real-coded Simulated Annealing (VSA) based fuzzy clustering method with variable length configuration is developed and combined with popular Artificial Neural Network (ANN) based classifier. The idea is to refine the clustering produced by VSA using ANN classifier to obtain improved clustering performance. The proposed technique is used to cluster three publicly available real life microarray data sets. The superior performance of the proposed technique has been demonstrated by comparing with some widely used existing clustering algorithms. Also statistical significance test has been conducted to establish the statistical significance of the superior performance of the proposed clustering algorithm. Finally biological relevance of the clustering solutions are established.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62A86 Fuzzy analysis in statistics
92D10 Genetics and epigenetics

Software:

Silhouettes
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Sharan, R.; Adi, M.-K.; Shamir, R., CLICK and EXPANDER: a system for clustering and visualizing gene expression data, Bioinformatics, 19, 1787-1799 (2003)
[2] Alizadeh, A. A.; Eisen, M. B.; Davis, R.; Ma, C.; Lossos, I.; Rosenwald, A., Distinct types of diffuse large b-cell lymphomas identified by gene expression profiling, Nature, 403, 503-511 (2000)
[3] Chu, S.; DeRisi, J.; Eisen, M.; Mulholland, J.; Botstein, D.; Brown, P. O., The transcriptional program of sporulation in budding yeast, Science, 282, 699-705 (1998)
[4] Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academic Science of the United States of America 1998;14863-8.; Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academic Science of the United States of America 1998;14863-8.
[5] Bandyopadhyay, S.; Maulik, U.; Wang, J. T., Analysis of biological data: a soft computing approach (2007), World Scientific: World Scientific Singapore
[6] Jain, A. K.; Dubes, R. C., Algorithms for clustering data (1988), Prentice-Hall: Prentice-Hall Englewood Cliffs, NJ · Zbl 0665.62061
[7] Tou, J. T.; Gonzalez, R. C., Pattern recognition principles (1974), Addison-Wesley: Addison-Wesley Reading · Zbl 0299.68058
[8] Hartigan, J. A., Clustering algorithms (1975), Wiley: Wiley New York · Zbl 0321.62069
[9] Cho, R. J.; Campbell, M. J.; Winzeler, E. A.; Steinmetz, L.; Conway, A.; Wodica, L., A genome-wide transcriptional analysis of mitotic cell cycle, Molecular Cell, 2, 65-73 (1998)
[10] Herwig, R.; Poustka, A.; Meuller, C.; Lehrach, H.; O’Brien, J., Large-scale clustering of cDNA fingerprinting data, Genome Research, 9, 11, 1093-1105 (1999)
[11] Dembele, D.; Kastner, P., Fuzzy c-means method for clustering microarray data, Bioinformatics, 19, 8, 973-980 (2003)
[12] Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America 1999;96:2907-12.; Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America 1999;96:2907-12.
[13] Hartuv, E.; Shamir, R., A clustering algorithm based on graph connectivity, Information Processing Letters, 76, 200, 175-181 (2000) · Zbl 0996.68525
[14] Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America 1999;96:6745-50.; Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America 1999;96:6745-50.
[15] Lukashin, A. V.; Fuchs, R., Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters, Bioinformatics, 17, 5, 405-414 (2001)
[16] Maulik, U.; Bandyopadhyay, S., Genetic algorithm based clustering technique, Pattern Recognition, 33, 1455-1465 (2000)
[17] Mukhopadhyay A, Maulik U, Bandyopadhyay S. Multiobjective evolutionary approach to fuzzy clustering of microarray data. Singapore: World Scientific; 2007. p. 303-26 [chapter 13].; Mukhopadhyay A, Maulik U, Bandyopadhyay S. Multiobjective evolutionary approach to fuzzy clustering of microarray data. Singapore: World Scientific; 2007. p. 303-26 [chapter 13].
[18] Bandyopadhyay, S.; Mukhopadhyay, A.; Maulik, U., An improved algorithm for clustering gene expression data, Bioinformatics, 23, 21, 2859-2865 (2007)
[19] Bishop, C., Neural networks for pattern recognition (1996), Oxford University Press: Oxford University Press Oxford · Zbl 0868.68096
[20] MacKay, D. J.C., The evidence framework applied to classification networks, Neural Computation, 4, 5, 720-736 (1992)
[21] Bezdek, J. C., Pattern recognition with fuzzy objective function algorithms (1981), Plenum Press: Plenum Press New York · Zbl 0503.68069
[22] Mewes, H. W.; Albermann, K.; Heumann, K.; Liebl, S.; Pfeiffer, F., MIPS, homology data and yeast genome information: a database for protein sequences, Nucleic Acid Research, 25, 28-30 (1997)
[23] Xie, X. L.; Beni, G., A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 841-847 (1991)
[24] Bandyopadhyay, S.; Maulik, U.; Mukhopadhyay, A., Multiobjective genetic clustering for pixel classification in remote sensing imagery, IEEE Transactions on Geoscience and Remote Sensing, 45, 5, 1506-1511 (2007)
[25] Domany, E., Cluster analysis of gene expression data, Journal of Statistical Physics, 110, 3-6, 1117-1139 (2003) · Zbl 1026.62120
[26] Shannon, W.; Culverhouse, R.; Duncan, J., Analyzing microarray data using cluster analysis, Pharmacogenomics, 4, 1, 41-51 (2003)
[27] Kim SY, Lee JW, Bae JS. Effect of data normalization on fuzzy clustering of DNA microarray data. BMC Bioinformatics 2006;7(134).; Kim SY, Lee JW, Bae JS. Effect of data normalization on fuzzy clustering of DNA microarray data. BMC Bioinformatics 2006;7(134).
[28] Kirkpatrick, S.; Gelatt, C.; Vecchi, M., Optimization by simulated annealing, Science, 220, 671-680 (1983) · Zbl 1225.90162
[29] van Laarhoven, P. J.M.; Aarts, E. H.L., Simulated annealing: theory and applications (1987), Kluwer Academic Publisher: Kluwer Academic Publisher Dordrecht · Zbl 0643.65028
[30] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 6, 721-741 (1984) · Zbl 0573.62030
[31] Caves, R.; Quegan, S.; White, R., Quantitative comparison of the performance of SAR segmentation algorithms, IEEE Transactions on Image Processing, 7, 11, 1534-1546 (1998)
[32] Maulik, U.; Bandyopadhyay, S.; Trinder, J., SAFE: an efficient feature extraction technique, Journal of Knowledge and Information Systems, 3, 374-387 (2001) · Zbl 0989.68125
[33] Bandyopadhyay, S.; Maulik, U.; Pakhira, M. K., Clustering using simulated annealing with probabilistic redistribution, International Journal of Pattern Recognition and Artificial Intelligence, 15, 2, 269-285 (2001)
[34] Andersen LN, Larsen J, Hansen LK, HintzMadsen M. Adaptive regularization of neural classifiers. In: Proceedings of the IEEE workshop on neural networks for signal processing VII, New York, USA; 1997. p. 24-33.; Andersen LN, Larsen J, Hansen LK, HintzMadsen M. Adaptive regularization of neural classifiers. In: Proceedings of the IEEE workshop on neural networks for signal processing VII, New York, USA; 1997. p. 24-33.
[35] Sigurdsson S, Larsen J, Hansen L. Outlier estimation and detection: application to skin lesion classification. In: Proceedings of the international conference on acoustics, speech and signal processing; 2002.; Sigurdsson S, Larsen J, Hansen L. Outlier estimation and detection: application to skin lesion classification. In: Proceedings of the international conference on acoustics, speech and signal processing; 2002.
[36] Iyer, V. R.; Eisen, M. B.; Ross, D. T.; Schuler, G.; Moore, T.; Lee, J., The transcriptional program in the response of the human fibroblasts to serum, Science, 283, 83-87 (1999)
[37] Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, et al. Large-scale temporal gene expression mapping of central nervous system development. Proceedings of the National Academy of Sciences of the United States of America 1998;95:334-9.; Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, et al. Large-scale temporal gene expression mapping of central nervous system development. Proceedings of the National Academy of Sciences of the United States of America 1998;95:334-9.
[38] Rousseeuw, P., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65 (1987) · Zbl 0636.62059
[39] Groll, L.; Jakel, J., A new convergence proof of fuzzy c-means, IEEE Transactions on Fuzzy Systems, 13, 5, 717-720 (2005)
[40] Hollander M, Wolfe DA. Nonparametric statistical methods, 2nd ed. 1999.; Hollander M, Wolfe DA. Nonparametric statistical methods, 2nd ed. 1999. · Zbl 0997.62511
[41] Tavazoie, S.; Hughes, J.; Campbell, M.; Cho, R.; Church, G., Systematic determination of genetic network architecture, Nature Genetics, 22, 281-285 (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.