×

Summarizing a set of time series by averaging: from Steiner sequence to compact multiple alignment. (English) Zbl 1232.68105

Summary: Summarizing a set of sequences is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. The definition of a consensus object is on the center of data analysis issues, since it crystallizes the underlying organization of the data.
Dynamic time warping (DTW) is currently the most relevant similarity measure between sequences for a large panel of applications, since it makes it possible to capture temporal distortions. In this context, averaging a set of sequences is not a trivial task, since the average sequence has to be consistent with this similarity measure.
The Steiner theory and several works in computational biology have pointed out the connection between multiple alignments and average sequences. Taking inspiration from these works, we introduce the notion of compact multiple alignment, which allows us to link these theories to the problem of summarizing under time warping. Having defined the link between the multiple alignment and the average sequence, the second part of this article focuses on the scan of the space of compact multiple alignments in order to provide an average sequence of a set of sequences. We propose to use a genetic algorithm based on a specific representation of the genotype inspired by genes. This representation of the genotype makes it possible to consistently paint the fitness landscape.
Experiments carried out on standard datasets show that the proposed approach outperforms existing methods.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68W32 Algorithms on strings
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology (1997), Cambridge University Press · Zbl 0934.68103
[2] H. Sakoe, S. Chiba, A dynamic programming approach to continuous speech recognition, in: Proceedings of the Seventh International Congress on Acoustics, vol. 3, 1971, pp. 65-69.; H. Sakoe, S. Chiba, A dynamic programming approach to continuous speech recognition, in: Proceedings of the Seventh International Congress on Acoustics, vol. 3, 1971, pp. 65-69.
[3] Shanker, A. P.; Rajagopalan, A., Off-line signature verification using DTW, Pattern Recognition Letters, 28, 12, 1407-1414 (2007)
[4] Sankoff, D.; Kruskal, J., The symmetric time-warping problem: from continuous to discrete, (Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison (1983), Addison Wesley Publishing Company), 125-161
[5] Aach, J.; Church, G. M., Aligning gene expression time series with time warping algorithms, Bioinformatics, 17, 6, 495-508 (2001)
[6] Bar-Joseph, Z.; Gerber, G.; Gifford, D. K.; Jaakkola, T. S.; Simon, I., A new approach to analyzing gene expression time series data, (RECOMB: Proceedings of the Sixth Annual International conference on Computational Biology (2002), ACM: ACM New York, NY, USA), 39-48
[7] D.M. Gavrila, L.S. Davis, Towards 3-D model-based tracking and recognition of human movement: a multi-view approach, in: IEEE International Workshop on Automatic Face- and Gesture-Recognition, 1995, pp. 272-277.; D.M. Gavrila, L.S. Davis, Towards 3-D model-based tracking and recognition of human movement: a multi-view approach, in: IEEE International Workshop on Automatic Face- and Gesture-Recognition, 1995, pp. 272-277.
[8] T. Rath, R. Manmatha, Word image matching using dynamic time warping, in: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, 2003, pp. 521-527.; T. Rath, R. Manmatha, Word image matching using dynamic time warping, in: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, 2003, pp. 521-527.
[9] V. Niennattrakul, C.A. Ratanamahatana, Shape averaging under time warping, in: International Conference on Electrical Engineering/Electronics, Computer, Telecommunications, and Information Technology, 2009.; V. Niennattrakul, C.A. Ratanamahatana, Shape averaging under time warping, in: International Conference on Electrical Engineering/Electronics, Computer, Telecommunications, and Information Technology, 2009.
[10] Gilbert, E. N.; Pollak, H. O., Steiner minimal trees, SIAM Journal on Applied Mathematics, 16, 1, 1-29 (1968) · Zbl 0159.22001
[11] Dimitriadou, E.; Weingessel, A.; Hornik, K., A combination scheme for fuzzy clustering, International Journal of Pattern Recognition and Artificial Intelligence, 16, 7, 901-912 (2002)
[12] Niennattrakul, V.; Ratanamahatana, C. A., Inaccuracies of shape averaging method using dynamic time warping for time series data, (Berlin, S., Computational Science-ICCS. Computational Science-ICCS, LNCS, vol. 4487 (2007))
[13] T. Liao, B. Bolt, J. Forester, E. Hailman, C. Hansen, R. Kaste, J. O’May, Understanding and projecting the battle state, in: 23rd Army Science Conference, 2002.; T. Liao, B. Bolt, J. Forester, E. Hailman, C. Hansen, R. Kaste, J. O’May, Understanding and projecting the battle state, in: 23rd Army Science Conference, 2002.
[14] Liao, T. W.; Ting, C.-F.; Chang, P.-C., An adaptive genetic clustering method for exploratory mining of feature vector and time series data, International Journal of Production Research, 44, 2731-2748 (2006) · Zbl 1128.62373
[15] Gupta, L.; Molfese, D.; Tammana, R.; Simos, P., Nonlinear alignment and averaging for estimating the evoked potential, IEEE Transactions on Biomedical Engineering, 43, 4, 348-356 (1996)
[16] Ongwattanakul, S.; Srisai, D., Contrast enhanced dynamic time warping distance for time series shape averaging classification, (International Conference on Interaction Sciences (2009), ACM)
[17] Petitjean, F.; Ketterlin, A.; Gançarski, P., A global averaging method for dynamic time warping, with applications to clustering, Pattern Recognition, 44, 3, 678-693 (2011) · Zbl 1209.68477
[18] E. Keogh, X. Xi, L. Wei, C.A. Ratanamahatana, The UCR Time Series Classification/Clustering Homepage, http://www.cs.ucr.edu/ eamonn/time_series_data/; E. Keogh, X. Xi, L. Wei, C.A. Ratanamahatana, The UCR Time Series Classification/Clustering Homepage, http://www.cs.ucr.edu/ eamonn/time_series_data/
[19] Sakoe, H.; Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 1, 43-49 (1978) · Zbl 0371.68035
[20] Needleman, S. B.; Wunsch, C. D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, 48, 3, 443-453 (1970)
[21] Wang, L.; Jiang, T., On the complexity of multiple sequence alignment, Journal of Computational Biology, 1, 4, 337-348 (1994)
[22] Notredame, C.; Higgins, D. G., SAGA: Sequence Alignment by Genetic Algorithm, Nucleic Acids Research, 24, 8, 1515-1524 (1996)
[23] Edgar, R. C., MUSCLE: a multiple sequence alignment method with reduced time and space complexity, BMC Bioinformatics, 5, 1, 1792-1797 (2004)
[24] Pei, J.; Sadreyev, R.; Grishin, N. V., PCMA: fast and accurate multiple sequence alignment based on profile consistency, Bioinformatics, 19, 3, 427-428 (2003)
[25] Lassmann, T.; Sonnhammer, E. L.L., Kalign — an accurate and fast multiple sequence alignment algorithm, BMC Bioinformatics, 6, 1, 298-306 (2005)
[26] Notredame, C.; Higgins, D. G.; Heringa, J., T-coffee: a novel method for fast and accurate multiple sequence alignment, Journal of Molecular Biology, 302, 1, 205-217 (2000)
[27] Pei, J.; Grishin, N. V., PROMALS: towards accurate multiple sequence alignments of distantly related proteins, Bioinformatics, 23, 7, 802-808 (2007)
[28] Kapsalis, A.; Rayward-Smith, V. J.; Smith, G. D., Solving the graphical Steiner tree problem using genetic algorithms, The Journal of the Operational Research Society, 44, 4, 397-406 (1993) · Zbl 0774.90083
[29] Fu, A. W.-C.; Keogh, E. J.; Lau, L. Y.H.; Ratanamahatana, C. A.; Wong, R. C.-W., Scaling and time warping in time series querying, VLDB Journal, 17, 4, 899-921 (2008)
[30] Keogh, E.; Ratanamahatana, C. A., Exact indexing of dynamic time warping, Knowledge and Information Systems, 7, 3, 358-386 (2005)
[31] Holland, J. H., Adaptation in Natural and Artificial Systems (1975), MIT Press: MIT Press Cambridge, MA, USA
[32] Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning (1989), Addison-Wesley Longman Publishing Co., Inc.: Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA · Zbl 0721.68056
[33] Darwin, C., On the Origin of Species by Means of Natural Selection (1859), John Murray: John Murray London
[34] Lamarck, J.-B., Philosophie Zoologique (1809), Dentu: Dentu Paris
[35] Grefenstette, J. J., Lamarckian learning in multi-agent environments, (Proceedings of the Fourth International Conference on Genetic Algorithms (1991), Morgan Kaufmann), 303-310
[36] Paredis, J., Coevolutionary life-time learning, (Proceedings of the 4th International Conference on Parallel Problem Solving from Nature. Proceedings of the 4th International Conference on Parallel Problem Solving from Nature, PPSN, vol. IV (1996), Springer-Verlag: Springer-Verlag London, UK), 72-80
[37] Ross, B. J., A Lamarckian Evolution Strategy for Genetic Algorithms, (Chambers, L., Practical Handbook of Genetic Algorithms: Complex Coding Systems, vol. 3 (1999), CRC Press), 1-16
[38] D.E. Goldberg, S. Voessner, Optimizing global-local search hybrids, in: Genetic and Evolutionary Computation Conference, 1999, pp. 212-219.; D.E. Goldberg, S. Voessner, Optimizing global-local search hybrids, in: Genetic and Evolutionary Computation Conference, 1999, pp. 212-219.
[39] Baldwin, J., A new factor in evolution, American Naturalist, 30, 441-451 (1896)
[40] Whitley, L. D.; Gordon, V. S.; Mathias, K. E., Lamarckian evolution, the Baldwin effect and function optimization, (Parallel Problem Solving from Nature. Parallel Problem Solving from Nature, PPSN, vol. III (1994), Springer-Verlag: Springer-Verlag London, UK), 6-15
[41] K. Ku, M. Mak, Exploring the effects of lamarckian and Baldwinian learning in evolving recurrent neural networks, in: IEEE International Conference on Evolutionary Computation, 1997, pp. 617-621.; K. Ku, M. Mak, Exploring the effects of lamarckian and Baldwinian learning in evolving recurrent neural networks, in: IEEE International Conference on Evolutionary Computation, 1997, pp. 617-621.
[42] P. Turney, Myths and legends of the Baldwin effect, in: Workshop on Evolutionary Computation and Machine Learning at the 13th International Conference on Machine Learning, 1996, pp. 135-142.; P. Turney, Myths and legends of the Baldwin effect, in: Workshop on Evolutionary Computation and Machine Learning at the 13th International Conference on Machine Learning, 1996, pp. 135-142.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.