×

Persistence images: a stable vector representation of persistent homology. (English) Zbl 1431.68105

The work is positioned at the intersection of topological data analysis (persistent homology) and machine learning. The main problem tackled by the authors is the creation of a representation of persistence diagrams (PDs) that preserves their essential features (e.g. resistance to noise), is usable as an input to most state-of-the-art machine learning algorithms, and maintains a clear relationship with the associated PD.
The main contribution of the work is the introduction of persistence images (PIs). The idea at the basis of the construction consists in mapping a PD to an integrable function \(\rho:\mathbb{R}^2\rightarrow\mathbb{R}\) called a persistence surface. The function \(\rho\) is defined by associating an integrable probability distribution to each point of a PD, weighted by a custom auxiliary function. Although results valid for integrable distributions are presented, only normalised symmetric Gaussians are considered in the applications proposed in the paper. Given a persistence surface, a PI is computed by considering the integral of \(\rho\) on a discretisation of a subdomain of the function.
The PI construction enables machine learning algorithms to take advantage of the information encoded in persistence diagrams. Furthermore, the authors prove the stability of PIs with respect to the 1-Wasserstein distance. However, the usage of PIs implies the choice of several parameters defining
1.
the resolution of the discretisation used in the construction;
2.
the probability distribution and its parameters;
3.
a weighting function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\).

The effectiveness of this method is investigated in a series of experiments. First PIs are compared with PDs and persistence landscapes on a shape classification task. There PDs are built by considering the Vietoris-Rips filtration of noisy point clouds. Importantly, the role of the aforementioned parameters is partially discussed, showing robustness with respect to changes in noise level used to generate the data set, variance of the Gaussian distribution chosen to generate the PI, and resolution. Furthermore, sparse support vector machines are used to discriminate PIs associated with different shapes, enabling detection of salient bins with respect to the discrimination task, that can be linked to regions of the PD associated to the considered PI.
A second application aims to determine the parameters of a discrete fluid flow model. Here, classification is performed via discriminant subspace ensemble.
The last experiment shows how PIs can be used to classify the solutions of the anisotropic Kuramoto-Sivashinsky equation at different time values.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
55N31 Persistent homology and applications, topological data analysis
68T09 Computational aspects of data analysis and big data
PDFBibTeX XMLCite
Full Text: arXiv Link