×

Indefinite proximity learning: a review. (English) Zbl 1472.68155

Summary: Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are, however, often naturally obtained by nonmetric proximity measures that cannot easily be handled by classical learning algorithms. Major efforts have been undertaken to provide approaches that can either directly be used for such data or to make standard methods available for these types of data. We provide a comprehensive survey for the field of learning with nonmetric proximities. First, we introduce the formalism used in nonmetric spaces and motivate specific treatments for nonmetric proximity data. Second, we provide a systematization of the various approaches. For each category of approaches, we provide a comparative discussion of the individual algorithms and address complexity issues and generalization properties. In a summarizing section, we provide a larger experimental study for the majority of the algorithms on standard data sets. We also address the problem of large-scale proximity learning, which is often overlooked in this context and of major importance to make the method relevant in practice. The algorithms we discuss are in general applicable for proximity-based clustering, one-class classification, classification, regression, and embedding approaches. In the experimental part, we focus on classification tasks.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

ExPASy; t-SNE
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Alpay, D. (1991). Some remarks on reproducing kernel Krein spaces. Rocky Mountain Journal of Mathematics, 21(4), 1189-1205. , · Zbl 0810.46025
[2] Balcan, M. F., Blum, A., & Srebro, N. (2008). A theory of learning with similarity functions. Machine Learning, 72(1-2), 89-112. ,
[3] Barnes, J., & Hut, P. (1986). A hierarchical O(N log N) force-calculation algorithm. Nature, 324(4), 446-449. ,
[4] Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396. http://dx.doi.org/10.1162/089976603321780317, · Zbl 1085.68119
[5] Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C., Estreicher, A., Gasteiger, E., … Schneider, M. (2003). The Swiss-Prot protein knowledgebase and its supplement Trembl in 2003. Nucleic Acids Research, 31, 365-370. ,
[6] Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. , · Zbl 1058.90049
[7] Brickell, J., Dhillon, I. S., Sra, S., & Tropp, J. A. (2008). The metric nearness problem. SIAM J. Matrix Analysis Applications, 30(1), 375-396. http://dx.doi.org/10.1137/060653391, · Zbl 1172.05018
[8] Buhmann, M. D. (2003). Radial basis functions. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511543241, · Zbl 1038.41001
[9] Bunte, K., Biehl, M., & Hammer, B. (2012). A general framework for dimensionality-reducing data visualization mapping. Neural Computation, 24(3), 771-804. http://dx.doi.org/10.1162/NECO_a_00250, · Zbl 1238.68117
[10] Bunte, K., Haase, S., Biehl, M., & Villmann, T. (2012). Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences. Neurocomputing, 90, 23-45. http://dx.doi.org/10.1016/j.neucom.2012.02.034,
[11] Bustos, B., & Skopal, T. (2011). Non-metric similarity search problems in very large collections. In S. Abiteboul, K. Böhm, C. Koch, & K. L. Tan (Eds.), Proceedings of the 2011 IEEE International Conference on Data Engineering (pp. 1362-1365). San Mateo, CA: IEEE Computer Society. ,
[12] Calana, Y. P., Cheplygina, V., Duin, R. P. W., Reyes, E. B. G., Orozco-Alzate, M., Tax, D. M. J., & Loog, M. (2013). On the informativeness of asymmetric dissimilarities. In E. R. Hancock & M. Pelillo (Eds.), Simbad (pp. 75-89). New York: Springer.
[13] Chen, D. G., Wang, H. Y., & Tsang, E. (2008). Generalized Mercer theorem and its application to feature space related to indefinite kernels. In Proceedings of the 7th International Conference on Machine Learning and Cybernetics (vol. 2, pp. 774-777). Piscataway, NJ: IEEE.
[14] Chen, H., Tino, P., & Yao, X. (2009). Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6), 901-914. ,
[15] Chen, H., Tino, P., & Yao, X. (2014). Efficient probabilistic classification vector machine with incremental basis function selection. IEEE Trans. Neural Network Learning Systems, 25(2), 356-369. ,
[16] Chen, J., & Ye, J. (2008). Training SVM with indefinite kernels. In Proceedings of the 25th International Conference on Machine Learning (pp. 136-143). New York: ACM.
[17] Chen, L., & Lian, X. (2008). Efficient similarity search in nonmetric spaces with local constant embedding. IEEE Trans. Knowl. Data Eng., 20(3), 321-336. ,
[18] Chen, Y., Garcia, E., Gupta, M., Rahimi, A., & Cazzanti, L. (2009). Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research, 10, 747-776. · Zbl 1235.68138
[19] Chen, Y., Gupta, M., & Recht, B. (2009). Learning kernels from indefinite similarities. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 145-152). New York: ACM.
[20] Choo, J., Bohn, S., Nakamura, G., White, A., & Park, H. (2012). Heterogeneous data fusion via space alignment using nonmetric multidimensional scaling. In Proceedings of the 12th International Conference on Data Mining (pp. 177-188). Piscataway, NJ: IEEE.
[21] Cichocki, A., & Amari, S. I. (2010). Families of alpha- beta- and gamma- divergences: Flexible and robust measures of similarities. Entropy, 12(6), 1532-1568. , · Zbl 1229.94030
[22] Cilibrasi, R., & Vitányi, P. M. B. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523-1545. , · Zbl 1297.68097
[23] Cox, T. F., & Cox, M. (2000). Multidimensional scaling (2nd ed.). London: Chapman and Hall/CRC. , · Zbl 1147.68460
[24] Davis, J. V., Kulis, B., Jain, P., Sra, S., & Dhillon, I. S. (2007). Information-theoretic metric learning. In Z. Ghahramani (Ed.), Machine learning: Proceedings of the Twenty-Fourth International Conference (vol. 227, pp. 209-216). New York: ACM. http://doi.acm.org/10.1145/1273496.1273523
[25] de Silva, V., & Tenenbaum, J. B. (2002). Global versus local methods in nonlinear dimensionality reduction. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems, 15 (pp. 705-712). Cambridge, MA: MIT Press.
[26] Deza, M., & Deza, E. (2009). Encyclopedia of distances. New York: Springer. , · Zbl 1167.51001
[27] Dubuisson, M. P., & Jain, A. (1994). A modified Hausdorff distance for object matching. In Proceedings of the 12th IAPR International Conference on Pattern Recognition (vol. 1, pp. 566-568). Los Alamitos, CA: IEEE Computer Society Press. ,
[28] Duin, R. P. W. (2010). Non-Euclidean problems in pattern recognition related to human expert knowledge. In J. Filipe & J. Cordeiro (Eds.), Proceedings of the 10th Annual Conference on Enterprise Information Systems (vol. 73, pp. 15-28). New York: Springer.
[29] Duin, R. P. (2012, March). PRTools.
[30] Duin, R. P. W., Bicego, M., Orozco-Alzate, M., Kim, S., & Loog, M. (2014). Metric learning in dissimilarity space for improved nearest neighbor performance. In P. Fränti, G. Brown, M. Loog, F. Escolano, & M. Pelillo (Eds.), Structural, syntactic, and statistical pattern recognition: Joint IAPR international workshop, S+SSPR 2014, (vol. 8621, pp. 183-192). New York: Springer. http://dx.doi.org/10.1007/978-3-662-44415-3_19
[31] Duin, R.P.W., & Pekalska, E. (2010). Non-Euclidean dissimilarities: Causes and informativeness. In Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop, SSPR&SPR (pp. 324-333). New York: Springer. , · Zbl 1314.68267
[32] Durrant, R. J., & Kaban, A. (2010). Compressed Fisher linear discriminant analysis: Classification of randomly projected data. In B. Rao, B. Krishnapuram, A. Tomkins, & Q. Yang (Eds.), Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1119-1128). New York: ACM. http://doi.acm.org/10.1145/1835804.1835945,
[33] Durrant, R. J., & Kaban, A. (2013). Random projections as regularizers: Learning a linear discriminant ensemble from fewer observations than dimensions. In C. S. Ong & T. B. Ho (Eds.), Proceedings of the Asian Conference on Machine Learning (vol. 29, pp. 17-32). . · Zbl 1331.68183
[34] Epifanio, I. (2013). H-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis and Data Mining, 6(2), 136-143. ,
[35] Feng, S., Krim, H., & Kogan, I. (2007, Aug). 3D face recognition using Euclidean integral invariants signature. In 14th Workshop on statistical signal processing, 2007 (pp. 156-160). Piscataway, NJ: IEEE. doi:10.1109/SSP.2007.4301238
[36] Filippone, M. (2009). Dealing with non-metric dissimilarities in fuzzy central clustering algorithms. International Journal of Approximate Reasoning, 50(2), 363-384. , · Zbl 1191.68570
[37] France, S., & Carroll, J. (2011). Two-way multidimensional scaling: A review. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 41(5), 644-661. doi:10.1109/TSMCC.2010.2078502,
[38] Gärtner, T., Lloyd, J. W., & Flach, P. A. (2004). Kernels and distances for structured data. Machine Learning, 57(3), 205-232. http://dx.doi.org/10.1023/B:MACH.0000039777.23772.30, · Zbl 1079.68086
[39] Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R., & Bairoch, A. (2003). Expasy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Research, 31, 3784-3788. ,
[40] Gisbrecht, A., Lueks, W., Mokbel, B., & Hammer, B. (2012). Out-of-sample kernel extensions for nonparametric dimensionality reduction. In Proceedings of the 20th European Symposium on Artificial Neural Networks. d-side.
[41] Gisbrecht, A., Mokbel, B., Schleif, F. M., Zhu, X., & Hammer, B. (2012). Linear time relational prototype based learning. Journal of Neural Systems, 22(5).
[42] Gisbrecht, A., & Schleif, F. (2014). Metric and non-metric proximity transformations at linear costs. CoRR abs/1411.1646.
[43] Gisbrecht, A., Schulz, A., & Hammer, B. (2015). Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing, 147, 71-82. http://dx.doi.org/10.1016/j.neucom.2013.11.045,
[44] Gnecco, G. (2013). Approximation and estimation bounds for subsets of reproducing kernel Kren spaces. Neural Processing Letters, 1-17.
[45] Goldfarb, L. (1984). A unified approach to pattern recognition. Pattern Recognition, 17(5), 575-582. , · Zbl 0547.68086
[46] Graepel, T., Herbrich, R., Bollmann-Sdorra, P., & Obermayer, K. (1998). Classification on pairwise proximity data. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in neural information processing systems, 11 (pp. 438-444). Cambridge, MA: MIT Press.
[47] Graepel, T., & Obermayer, K. (1999). A stochastic self-organizing map for proximity data. Neural Computation, 11(1), 139-155. ,
[48] Gu, S., & Guo, Y. (2012). Learning SVM classifiers with indefinite kernels. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (vol. 2, pp. 942-948). Cambridge, MA: AAAI Press.
[49] Guo, Z. C., & Ying, Y. (2014). Guaranteed classification via regularized similarity learning. Neural Computation, 26(3), 497-522. , · Zbl 1410.68316
[50] Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge: Cambridge University Press. , · Zbl 0934.68103
[51] Haasdonk, B. (2005). Feature space interpretation of SVMs with indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4), 482-492. ,
[52] Haasdonk, B., & Keysers, D. (2002). Tangent distance kernels for support vector machines. In Proceedings of the 16th International Conference on Pattern Recognition (pp. 864-868). ,
[53] Haasdonk, B., & Pekalska, E. (2008). Indefinite kernel Fisher discriminant. In Proceedings of the 19th International Conference on Pattern Recognition (pp. 1-4). Piscataway, NJ: IEEE.
[54] Hammer, B., & Hasenfuss, A. (2010). Topographic mapping of large dissimilarity data sets. Neural Computation, 22(9), 2229-2284. , · Zbl 1205.68127
[55] Hammer, B., Hoffmann, D., Schleif, F. M., & Zhu, X. (2014). Learning vector quantization for (dis-)similarities. NeuroComputing, 131, 43-51. ,
[56] Higham, N. (1988). Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and Its Applications, 103(C), 103-118. , · Zbl 0649.65026
[57] Hodgetts, C., & Hahn, U. (2012). Similarity-based asymmetries in perceptual matching. Acta Psychologica, 139(2), 291-299. ,
[58] Hodgetts, C., Hahn, U., & Chater, N. (2009). Transformation and alignment in similarity. Cognition, 113(1), 62-79. ,
[59] Hofmann, D., Schleif, F. M., & Hammer, B. (2014). Learning interpretable kernelized prototype-based models. NeuroComputing, 131, 43-51. ,
[60] Hofmann, T., & Buhmann, J. M. (1997). Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell., 19(1), 1-14. http://doi.ieeecomputersociety.org/10.1109/34.566806,
[61] Jain, A. K., & Zongker, D. (1997). Representation and recognition of handwritten digits using deformable templates. IEEE Trans. Pattern Anal. Mach. Intell., 19(12), 1386-1391. doi:10.1109/34.643899,
[62] Jensen, C., Mungure, E., Pedersen, T., Srensen, K., & Delige, F. (2010). Effective bitmap indexing for non-metric similarities. Lecture Notes in Computer Science, 6261 LNCS(Part 1), 137-151. New York: Springer. doi:10.1007/978-3-642-15364-8_10,
[63] Kane, D. M., & Nelson, J. (2014). Sparser Johnson-Lindenstrauss Transforms. J. ACM, 61(1), 4:1-4:23. http://doi.acm.org/10.1145/2559902, · Zbl 1295.68134
[64] Kanzawa, Y. (2012). Entropy-regularized fuzzy clustering for non-Euclidean relational data and indefinite kernel data. Journal of Advanced Computational Intelligence and Intelligent Informatics, 16(7), 784-792.
[65] Kar, P., & Jain, P. (2011). Similarity-based learning via data driven embeddings. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 24 (pp. 1998-2006). Red Hook, NY: Curran.
[66] Kar, P., & Jain, P. (2012). Supervised learning with similarity functions. In P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 25 (vol. 1, pp. 215-223). Red Hook, NY: Curran.
[67] Kinsman, T., Fairchild, M., & Pelz, J. (2012). Color is not a metric space implications for pattern recognition, machine learning, and computer vision. In Proceedings of the 2012 Western New York Image Processing Workshop (pp. 37-40). Piscataway, NJ: IEEE. ,
[68] Kohonen, T., & Somervuo, P. (2002). How to make large self-organizing maps for nonvectorial data. Neural Networks, 15(8-9), 945-952. ,
[69] Kowalski, M., Szafranski, M., & Ralaivola, L. (2009). Multiple indefinite kernel learning with mixed norm regularization. In Proceedings of the 26th Annual International Workshop on Machine Learning. New York: ACM.
[70] Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1-27. , · Zbl 0123.36803
[71] Lanckriet, G. R. G., Cristianini, N., Bartlett, P. L., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27-72. · Zbl 1222.68241
[72] Laub, J. (2004. Non-metric pairwise proximity data. Doctoral dissertation, Technical University, Berlin.
[73] Laub, J., Roth, V., Buhmann, J. M., & Müller, K. R. (2006). On the information and representation of non-Euclidean pairwise data. Pattern Recognition, 39(10), 1815-1826. , · Zbl 1096.68721
[74] Lee, J., & Verleysen, M. (2005). Generalizations of the lp norm for time series and its application to self-organizing maps. In M. Cottrell (Ed.), Proceedings of the 5th Workshop on Self-Organizing Maps (vol. 1, pp. 733-740). Paris: Sorbonne University.
[75] Lee, J., & Verleysen, M. (2007). Nonlinear dimension reduction. New York: Springer. , · Zbl 1128.68024
[76] Li, B. Y. S., Yeung, L. F., & Ko, K. T. (2015). Indefinite kernel ridge regression and its application on {QSAR} modelling. Neurocomputing, 158(0), 127-133. ,
[77] Lichtenauer, J., Hendriks, E., & Reinders, M. (2008). Sign language recognition by combining statistical DTW and independent classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 2040-2046. ,
[78] Ling, H., & Jacobs, D. W. (2005). Using the inner-distance for classification of articulated shapes. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 719-726). San Mateo, CA: IEEE Computer Society. http://dx.doi.org/10.1109/CVPR.2005.362,
[79] Liwicki, S., Zafeiriou, S., & Pantic, M. (2013). Incremental slow feature analysis with indefinite kernel for online temporal video segmentation. Lecture Notes in Computer Science 7725 LNCS(Part 2), 162-176. New York: Springer. , · Zbl 1408.94418
[80] Liwicki, S., Zafeiriou, S., Tzimiropoulos, G., & Pantic, M. (2012). Efficient online subspace learning with an indefinite kernel for visual tracking and recognition. IEEE Transactions on Neural Networks and Learning Systems, 23(10), 1624-1636. ,
[81] Lu, F., Keles, S., Wright, S. K., & Wahba, G. (2005). Framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences of the United States of America, 102(35), 12332-12337. , · Zbl 1135.62345
[82] Luss, R., & d’Aspremont, A. (2009). Support vector machine classification with indefinite kernels. Mathematical Programming Computation, 1(2-3), 97-118. , · Zbl 1191.68511
[83] Maier, T., Klebel, S., Renner, U., & Kostrzewa, M. (2006). Fast and reliable maldi-tof ms-based microorganism identification. Nature Methods (3).
[84] Mierswa, I., & Morik, K. (2008). About the non-convex optimization problem induced by non-positive semidefinite kernel learning. Advances in Data Analysis and Classification, 2(3), 241-258. , · Zbl 1284.90058
[85] Miranda, N., Chvez, E., Piccoli, M., & Reyes, N. (2013). (Very) fast (all) k-nearest neighbors in metric and non metric spaces without indexing. Lecture Notes in Computer Science 8199 LNCS, 300-311. New York: Springer. ,
[86] Mokbel, B., Hasenfuss, A., & Hammer, B. (2009). Graph-based representation of symbolic musical data. In A. Torsello, F. Escolano, & L. Brun (Eds.), Proceedings of the Graph-Based Representations in Pattern Recognition, 7th IAPR-TC-15 International Workshop (vol. 5534, pp. 42-51). New York: Springer. http://dx.doi.org/10.1007/978-3-642-02124-4_5,
[87] Mu, Y., & Yan, S. (2010). Non-metric locality-sensitive hashing. In M. Fox & D. Poole (Eds.), Proceedings of the 24th AAAI Conference on Artificial Intelligence. Cambridge, MA: AAAI Press.
[88] Muoz, A., & De Diego, I. (2006). From indefinite to positive semi-definite matrices. Lecture Notes in Computer Science 4109 LNCS, 764-772. New York: Springer.
[89] Mwebaze, E., Schneider, P., Schleif, F. M., Aduwo, J., Quinn, J., Haase, S., … Biehl, M. (2010). Divergence based classification in learning vector quantization. NeuroComputing, 74, 1429-1435. ,
[90] Mylavarapu, S., & Kaban, A. (2013). Random projections versus random selection of features for classification of high dimensional data. In Proceedings of the 13th UK Workshop on Computational Intelligence, UKCI 2013 (pp. 305-312). Piscataway, NJ: IEEE. http://dx.doi.org/10.1109/UKCI.2013.6651321
[91] Nebel, D., Hammer, B., & Villmann, T. (2014). Supervised generative models for learning dissimilarity data. In M. Verleysen (Ed.), Proceedings of the 22nd European Symposium on Artificial Neural Networks (pp. 35-40). d-side.
[92] Neuhaus, M., & Bunke, H. (2006). Edit distance based kernel functions for structural pattern classification. Pattern Recognition, 39(10), 1852-1863. , · Zbl 1096.68140
[93] Nguyen, N., Abbey, C., & Insana, M. (2013). Objective assessment of sonographic: Quality II acquisition information spectrum. IEEE Transactions on Medical Imaging, 32(4), 691-698. ,
[94] Olszewski, D., & Ster, B. (2014). Asymmetric clustering using the alpha-beta divergence. Pattern Recognition, 47(5), 2031-2041. http://dx.doi.org/10.1016/j.patcog.2013.11.019,
[95] Ong, C., Mary, X., Canu, S., & Smola, A. (2004). Learning with non-positive kernels. In Proceedings of the 21st International Conference on Machine Learning (pp. 639-646). New York: ACM.
[96] Pekalska, E., & Duin, R.P.W. (2002). Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8), 943-956. , · Zbl 1015.68160
[97] Pekalska, E., & Duin, R. (2005). The dissimilarity representation for pattern recognition. Singapore: World Scientific. · Zbl 1095.68105
[98] Pekalska, E., & Duin, R. (2008a). Beyond traditional kernels: Classification in two dissimilarity-based representation spaces. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 38(6), 729-744. ,
[99] Pekalska, E., & Duin, R.P.W. (2008b). Beyond traditional kernels: Classification in two dissimilarity-based representation spaces. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 38(6), 729-744. ,
[100] Pekalska, E., Duin, R.P.W., Günter, S., & Bunke, H. (2004). On not making dissimilarities Euclidean. In Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshops (pp. 1145-1154). New York: Springer. , · Zbl 1104.68674
[101] Pekalska, E., Duin, R. P. W., & Paclík, P. (2006). Prototype selection for dissimilarity-based classifiers. Pattern Recognition, 39(2), 189-208. , · Zbl 1080.68646
[102] Pekalska, E., & Haasdonk, B. (2009). Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 1017-1031. ,
[103] Pekalska, E., Paclík, P., & Duin, R.P.W. (2001). A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2, 175-211. · Zbl 1037.68127
[104] Philips, S., Pitton, J., & Atlas, L. (2006, September). Perceptual feature identification for active sonar echoes. In Oceans 2006 (pp. 1-6). ,
[105] Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. Redmond, WA: Microsoft Research.
[106] Platt, J. (2005). Fastmap, Metricmap, and Landmark MDS are all Nyström algorithms. (Technical Rep.). Redmond, WA: Microsoft Research.
[107] Poleksic, A. (2011). Optimal pairwise alignment of fixed protein structures in subquadratic time. J. Bioinformatics and Computational Biology, 9, 367-382. ,
[108] Roth, V., Laub, J., Buhmann, J. M., & Müller, K. R. (2002). Going metric: Denoising pairwise data. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems, 15 (pp. 817-824). Cambridge, MA: MIT Press.
[109] Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43-49. doi:10.1109/TASSP.1978.1163055, · Zbl 0371.68035
[110] Scheirer, W. J., Wilber, M. J., Eckmann, M., & Boult, T. E. (2014). Good recognition is non-metric. Pattern Recognition, 47(8), 2721-2731. http://dx.doi.org/10.1016/j.patcog.2014.02.018,
[111] Schleif, F. M. (2014). Proximity learning for non-standard big data. In Proceedings of the 22nd European Symposium on Artificial Neural Networks (pp. 359-364). d-side.
[112] Schleif, F. M. (2015). Generic probabilistic prototype based classification of vectorial and proximity data. Neurocomputing, 154, 208-216. ,
[113] Schleif, F. M., & Gisbrecht, A. (2013). Data analysis of (non-)metric proximities at linear costs. In Proceedings of Simbad 2013 (pp. 59-74). New York: Springer.
[114] Schnitzer, D., Flexer, A., & Widmer, G. (2012). A fast audio similarity retrieval method for millions of music tracks. Multimedia Tools and Applications, 58(1), 23-40. doi:10.1007/s11042-010-0679-8,
[115] Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge, MA: MIT Press. · Zbl 1019.68094
[116] Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis and discovery. New York: Cambridge University Press. , · Zbl 0994.68074
[117] Skopal, T., & Loko, J. (2008). NM-tree: Flexible approximate similarity search in metric and non-metric spaces. Lecture Notes in Computer Science, 5181 LNCS, 312-325. New York: Springer. ,
[118] Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195-197. ,
[119] Stanfill, C., & Waltz, D. (1986). Toward memory-based reasoning. Commun. ACM, 29(12), 1213-1228. http://doi.acm.org/10.1145/7902.7906,
[120] Strickert, M., Bunte, K., Schleif, F. M., & Huellermeier, E. (2014). Correlation-based neighbor embedding. NeuroComputing, 141, 97-109. ,
[121] Tian, J., Cui, S., & Reinartz, P. (2013). Building change detection based on satellite stereo imagery and digital surface models. IEEE Transactions on Geoscience and Remote Sensing, 52, 406-417. ,
[122] Tien Lin, H., & Lin, C. J. (2003). A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. (Tech. Rep.). Taipei: Department of Computer Science and Information Engineering, National Taiwan University.
[123] Tipping, M. E. (2000). Sparse kernel principal component analysis. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 633-639). Cambridge, MA: MIT Press.
[124] Tipping, M. (2001a). The relevance vector machine. Journal of Machine Learning Research, 1, 211-244. , · Zbl 0997.68109
[125] Tipping, M. (2001b). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(3), 211-244. , · Zbl 0997.68109
[126] van der Maaten, L. (2013). Barnes-hut-sne. CoRR abs/1301.3342.
[127] Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605. · Zbl 1225.68219
[128] Van der Maaten, L., & Hinton, G. (2012). Visualizing non-metric similarities in multiple maps. Machine Learning, 87(1), 33-55. , · Zbl 1238.68140
[129] van der Meer, F. (2006). The effectiveness of spectral similarity measures for the analysis of hyperspectral imagery. International Journal of Applied Earth Observation and Geoinformation, 8(1), 3-17. ,
[130] Vapnik, V. (2000). The nature of statistical learning theory. New York: Springer. , · Zbl 0934.62009
[131] Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J. Mach. Learn. Res., 11, 451-490. · Zbl 1242.62006
[132] Vladymyrov, M., & Carreira-Perpiñán, M. Á. (2013). Locally linear landmarks for large-scale manifold learning. In H. Blockeel, K. Kersting, S. Nijssen, & F. Zelezný (Eds.), Machine learning and knowledge discovery in databases: Proceedings of the European Conference, ECML PKDD 2013, (vol. 8190, pp. 256-271). New York: Springer. http://dx.doi.org/10.1007/978-3-642-40994-3_17
[133] Vojt, P., & Eckhardt, A. (2009). Using tuneable fuzzy similarity in non-metric search. In Proceedings of the Second Workshop on Similarity Search and Applications (pp. 163-164). Los Alamitos, CA: IEEE Computer Society Press.
[134] Wang, L., Sugiyama, M., Yang, C., Hatano, K., & Feng, J. (2009). Theory and algorithm for learning with dissimilarity functions. Neural Computation, 21(5), 1459-1484. , · Zbl 1178.68476
[135] Williams, C. K. I., & Seeger, M. (2000). Using the Nyström method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems, 13 (pp. 682-688). Cambridge, MA: MIT Press.
[136] Wilson, R., & Hancock, E. (2010). Spherical embedding and classification. Lecture Notes in Computer Science, 6218 LNCS, 589-599. New York: Springer. ,
[137] Xu, W., Wilson, R., & Hancock, E. (2011). Determining the cause of negative dissimilarity eigenvalues. Lecture Notes in Computer Science, 6854 LNCS(Part 1), 589-597. New York: Springer. ,
[138] Xue, H., & Chen, S. (2014). Discriminality-driven regularization framework for indefinite kernel machine. Neurocomputing, 133, 209-221. ,
[139] Yang, J., & Fan, L. (2013). A novel indefinite kernel dimensionality reduction algorithm: Weighted generalized indefinite kernel discriminant analysis. Neural Processing Letters, 40, 301-313. ,
[140] Yang, Z., Peltonen, J., & Kaski, S. (2013). Scalable optimization of neighbor embedding for visualization. In Proceedings of the 30th International Conference on Machine Learning (vol. 28, pp. 127-135). .
[141] Ying, Y., Campbell, C., & Girolami, M. (2009). Analysis of SVM with indefinite kernels. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems, 22. Red Hook, NY: Curran.
[142] Zafeiriou, S. (2012). Subspace learning in Krein spaces: Complete kernel Fisher discriminant analysis with indefinite kernels. In A. W. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), Proceedings of the 12th European Conference on Computer Vision (vol. 7575, pp. 488-501). New York: Springer.
[143] Zhang, K., Tsang, I. W., & Kwok, J. T. (2008). Improved Nystrom low-rank approximation and error analysis. In Proceedings of the 25th International Conference on Machine Learning (pp. 1232-1239). New York: ACM. http://doi.acm.org/10.1145/1390156.1390311
[144] Zhang, Z., Ooi, B., Parthasarathy, S., & Tung, A. (2009). Similarity search on Bregman divergence: Towards non-metric indexing. PVLDB, 2, 13-24.
[145] Zhou, J. C., & Wang, D. (2011). An improved indefinite kernel machine regression algorithm with norm-r loss function. In Proceedings of the 4th International Conference on Information and Computing (pp. 142-145). Piscataway, NJ: IEEE.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.