×

Product formalisms for measures on spaces with binary tree structures: representation, visualization, and multiscale noise. (English) Zbl 1453.62802

Summary: In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [R. A. Fefferman et al., Ann. Math. (2) 134, No. 1, 65–124 (1991; Zbl 0770.35014); A. Beurling and L. V. Ahlfors, Acta Math. 96, 125–142 (1956; Zbl 0072.29602); L. V. Ahlfors, Lectures on quasiconformal mappings. Princeton, N. J. etc.: D. Van Nostrand Company (1966; Zbl 0138.06002)]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.

MSC:

62R07 Statistical aspects of big data and data science
60A10 Probabilistic measure theory
28A35 Measures and integrals in product spaces

Software:

GloVe; word2vec
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Ahlfors, L., Lectures on Quasi-Conformal Mappings, Mathematical Studies, vol. 10 (1966), van Nostrand. · Zbl 0138.06002
[2] Astala, K., Kupiainen, A., Saksman, E., and Jones, P., ‘Random conformal weldings’, Acta Math.207(2), (2011) 203-254. · Zbl 1253.30032
[3] Beurling, A. and Ahlfors, L., ‘The boundary correspondence under quasi-conformal mappings’, Acta Math.96 (1956), 125-142. · Zbl 0072.29602
[4] Bassu, D., Izmailov, R., Mcintosh, A., Ness, L., and Shallcross, D., ‘Centralized multi-scale singular value decomposition for feature construction in LiDAR image classification problems’, IEEE AIPR1-6 (2012).
[5] Belkin, M. and Niyogi, P., ‘Laplacian eigenmaps for dimensionality reduction and data representation’, Neural Computation15 (2003), 1373-1396. · Zbl 1085.68119
[6] Billingsley, P., Probability and Measure (Wiley, 2012). · Zbl 1236.60001
[7] Brodu, N. and Lague, D., ‘3D terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology’, ISPRS Journal of Photogrammetry and Remote Sensing16 (2012), 121-134.
[8] Bruna, J. and Mallat, S., ‘Invariant scattering convolutional networks’, IEEE Trans. on Pattern Analysis and Machine Intelligence35(8) (2013).
[9] Bruna, J., Szlam, A., and Lecun, Y., ‘Learning stable group invariant representations with convolutional networks, ICLR (January 2013).
[10] Campbell, J. B., Introduction to Remote Sensing, (3rd ed.) (The Guilford Press, 2002).
[11] Coifman, R. and Lafon, S., ‘Diffusion maps’, Applied and Computational Harmonic Analysis21 (2006), 5-30. · Zbl 1095.68094
[12] Coifman, R., Lafon, S., Maggioni, M., Nadler, B., Warner, F., and Zucker, S. W., ‘Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps’, Proc. Natl. Acad. Sci. USA102 (2005), 7426-7431. · Zbl 1405.42043
[13] Comer, D., Internetworking with TCP/IP 4th Edition: Principles,Protocol and Architecture (Pearson, 2000), vol. 1.
[14] Cybenko, G., ‘Approximation by superpositions of a sigmoidal function’, Mathematics of Control, Signals, and Systems2, 303-314. · Zbl 0679.94019
[15] Fefferman, R., Kenig, C., and Pipher, J., ‘The theory of weights and the Dirichlet problem for elliptical equations’, Ann. of Math134 (1991), 65-124. · Zbl 0770.35014
[16] Fowlkes, C., Belongie, S., Chung, F., and Malik, J., ‘Spectral grouping using the Nyström method’, IEEE Transactions on Pattern Analysis and Machine Intelligence26(2) (2004), 214-225.
[17] Golberg, Y., ‘A primer on neural network models for natural language processing’, J. Artificial Intelligence (2016). · Zbl 1401.68264
[18] Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning (MIT Press, 2016), Section 6.4.1. · Zbl 1373.68009
[19] Grebenkov, D. S., Beliaev, D., and Jones, P. W., ‘A multiscale guide to Brownian motion’, Journal of Physics A: Mathematical and Theoretical49(4) (2015), 043001. · Zbl 1344.65016
[20] Hornik, K., and White, H., ‘Multilayer feedforward networks are universal approximators’, Neural Networks2 (1989), 359-366. · Zbl 1383.92015
[21] Kahane, J.-P. and Peyriere, J, ‘Sur certaines martingales de B. Mandelbrot’, Adv. Math.22 (1976), 131-145. · Zbl 0349.60051
[22] Kahane, J.-P., ‘Sur le chaos multiplicative’, Ann.Sci. Math.9(2) (1985) 105-150. · Zbl 0596.60041
[23] Kunin, D., Bloom, J., Goeva, A., and Seed, C., ‘Loss landscapes of regularized linear autoencoders’, ICML (2019), 3560-3569.
[24] Jones, P. W., ‘On removable sets for Sobolev spaces’, in: Fefferman, C., et al. eds., Essays on Fourier Analysis in Honor of E.M. Stein (Princeton University Press, 1995), 250-267. · Zbl 0839.30020
[25] Medina, F. P., Ness, L., Weber, M., and Yacoubou Djima, K., ‘Heuristic framework for multiscale testing of the multi-manifold hypothesis’, in: Domeniconi, C. and Gasparovic, E., eds., Research in Data Science (Springer AWM Series, 2019). · Zbl 1430.68269
[26] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., ‘Distributed representations of words and phrases and their compositionality’, NIPS’132 (2014), 3111-3119.
[27] Mallat, S., ‘Group invariant scattering’, Comm. Pure Appl. Math.65(10) (2012), 1331-1398. · Zbl 1282.47009
[28] Mandelbrot, B. B., ‘A possible refinement of the lognormal hypothesis concerning the distribution of energy in intermittent turbulence’, in: Statistical Models and Turbulence, Lecture Notes in Phys. no. 12 (Springer, 1972), 333-351. · Zbl 0227.76081
[29] Mumford, D., and Sharon, E., ‘\(2D\) -shape analysis using conformal mapping’, Int. J. of Computer Vision70 (2006), 55-75. · Zbl 1477.68492
[30] Ness, L., ‘Dyadic product formula representations of confidence measures and decision rules for dyadic data set samples’, MISNC SI, DS ’16, August 2016, Union, NJ, USA.
[31] Ness, L., ‘Inference of a dyadic measure and its simplicial geometry from binary feature data and application to data quality’, in: Domeniconi, C. and Gasparovic, E., eds., Research in Data Science (Springer AWM Series, 2019). · Zbl 1430.68283
[32] Oikawa, K., ‘Welding of polygons and the type of Riemann surfaces’, Kodai Math. Sem. Rep13(1) (1961), 37-52. · Zbl 0129.05702
[33] Okikiolu, K., ‘Characterization of subsets of rectifiable curves in \({R}^n\) , J. London Math. Soc.46(2) (1992), 336-348. · Zbl 0758.57020
[34] Osgood, W. F., ‘A Jordan curve of positive area’, Trans. Amer. Math. Soc.4(1) (1903), 107-112. · JFM 34.0533.02
[35] Pennington, J., Socher, R., and Manning, C., ‘GloVe: global vectors for word representation, EMNLP (2014).
[36] Shaham, U., Cloninger, A., and Coifmann, R., ‘Provable approximation properties for deep neural networks’, Appl. Comput. Harmon. Anal.44(3) (2018), 527-557. · Zbl 1390.68553
[37] Ylonene, T., Turner, P., Scarfone, K., and Souppaya, S. M., ‘Security of interactive and automated access management using Secure Shell (SSH)’, NIST Internal Report 7966 (2015).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.