Time series classification by class-specific Mahalanobis distance measures. (English) Zbl 1254.62096

Summary: To classify time series by nearest neighbors, we need to specify or learn one or several distance measures. We consider variations of the Mahalanobis distance measures which rely on the inverse covariance matrix of the data. Unfortunately – for time series data – the covariance matrix has often low rank. To alleviate this problem we can either use a pseudoinverse, covariance shrinking or limit the matrix to its diagonal. We review these alternatives and benchmark them against competitive methods such as the related Large Margin Nearest Neighbor Classification (LMNN) and the Dynamic Time Warping (DTW) distance. As we expected, we find that the DTW is superior, but the Mahalanobis distance measures are one to two orders of magnitude faster. To get best results with Mahalanobis distance measures, we recommend learning one distance measure per class using either covariance shrinking or the diagonal approach.


62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C60 Computational problems in statistics (MSC2010)


GitHub; Matlab; LMNN; FastDTW
Full Text: DOI arXiv Link


[1] Breiman L (1998) Classification and regression trees. Chapman & Hall/CRC, London
[2] Chai J, Liu H, Chen B, Bao Z (2010) Large margin nearest local mean classifier. Signal Process 90(1): 236–248 · Zbl 1177.68159
[3] Chouakria A, Nagabhushan P (2007) Adaptive dissimilarity index for measuring time series proximity. Adv Data Anal Classif 1: 5–21 · Zbl 1131.62078
[4] Csatári B, Prekopcsák Z (2010) Class-based attribute weighting for time series classification. In: POSTER 2010: Proceedings of the 14th International Student Conference on Electrical Engineering
[5] Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: VLDB ’08, pp 1542–1552
[6] Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2): 179–188
[7] Gaudin R, Nicoloyannis N (2006) An adaptable time warping distance for time series learning. In: ICMLA ’06, pp 213–218
[8] Hastie T, Tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE T Pattern Anal 18(6): 607–616 · Zbl 05111708
[9] Ishikawa Y, Subramanya R, Faloutsos C (1998) Mindreader: Querying databases through multiple examples. In: VLDB ’98, pp 218–227
[10] Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE T Acoust Speech 23(1): 67–72
[11] Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inform Sci 179(17): 2964–2973 · Zbl 1194.68190
[12] Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44(9): 2231–2240 · Zbl 05937888
[13] Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/\(\sim\)eamonn/time_series_data/ (last checked on 14/05/2012)
[14] Legrand B, Chang C, Ong S, Neo SY, Palanisamy N (2008) Chromosome classification using dynamic time warping. Pattern Recogn Lett 29(3): 215–222
[15] Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn 42: 2169–2180 · Zbl 1169.68594
[16] Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Acad Sci India 2(1): 49–55 · Zbl 0015.03302
[17] Matton M, Compernolle DV, Cools R (2010) Minimum classification error training in example based speech and pattern recognition using sparse weight matrices. J Comput Appl Math 234(4): 1303–1311 · Zbl 1209.68471
[18] Ouyang Y, Zhang F (2010) Histogram distance for similarity search in large time series database. In: IDEAL ’10, pp 170–177
[19] Paredes R, Vidal E (2000) A class-dependent weighted dissimilarity measure for nearest neighbor classification problems. Pattern Recogn Lett 21(12): 1027–1036 · Zbl 0967.68143
[20] Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39(2): 180–188 · Zbl 1080.68647
[21] Pham DT, Chan AB (1998) Control chart pattern recognition using a new type of self-organizing neural network. Proc Inst Mech Eng I J Syst Control Eng 212(2): 115–127
[22] Prekopcsák Z (2011) Matlab code for the experiments. http://github.com/Preko/Time-series-classification (last checked on 14/05/2012)
[23] Ratanamahatana CA, Keogh E (2005) Three myths about Dynamic Time Warping data mining. In: SDM ’05
[24] Saito N (1994) Local feature extraction and its applications using a library of bases. PhD thesis, Yale University, New Haven
[25] Sakoe H, Chiba S (1978a) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49 · Zbl 0371.68035
[26] Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49 · Zbl 0371.68035
[27] Salvador S, Chan P (2007) FastDTW: Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5): 561–580
[28] Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Molec Biol 4(1): 32
[29] Short R, Fukunaga K (1980) A new nearest neighbor distance measure. In: ICPR ’80, pp 81–86
[30] Shumway RH (1982) Discriminant analysis for time series. In: Krishnaiah P, Kanal L (eds) Classification pattern recognition and reduction of dimensionality, Handbook of Statistics, vol 2. Elsevier, Amsterdam, pp 1–46
[31] Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical and Statistical Probability, pp 197–206 · Zbl 0073.35602
[32] Sternickel K (2002) Automatic pattern recognition in ECG time series. Comput Meth Prog Bio 68(2): 109–115 · Zbl 05462722
[33] Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev, , pp pp 49–95
[34] Weihs C, Ligges U, Mrchen F, Mllensiefen D (2007) Classification in music research. Adv Data Anal Classif 1: 255–291 · Zbl 1183.62109
[35] Weinberger K, Saul L (2008) Large margin nearest neighbor–matlab code. http://www.cse.wustl.edu/\(\sim\)kilian/Downloads/LMNN.html (last checked on 14/05/2012)
[36] Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. JMLR 10: 207–244 · Zbl 1235.68204
[37] Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11(1–5): 273–314 · Zbl 05469233
[38] Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Tech. rep., Michigan State University, USA. http://www.cs.cmu.edu/\(\sim\)liuy/frame_survey_v2.pdf (last checked on 14/05/2012)
[39] Yu D, Yu X, Hu Q, Liu J, Wu A (2011) Dynamic time warping constraint learning for large margin nearest neighbor classification. Inform Sci 181(13): 2787–2796 · Zbl 05911337
[40] Zhan DC, Li M, Li YF, Zhou ZH (2009) Learning instance specific distances using metric propagation. In: ICML’09, pp 1225–1232
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.