an:07207733
Zbl 1452.81004
Ruehle, Fabian
Data science applications to string theory
EN
Phys. Rep. 839, 1-117 (2020).
00449755
2020
j
81-08 55N31 68T05 68T07 68T20 81T30
machine learning; data science; neural networks; genetic algorithms; string landscape
In the paper under review, the author provides a pedagogical introduction to data science techniques that are used to study large data sets, and outline their applications to string theory.
The problem with the string landscape is that it is unfathomably big. There is a huge number of different choices for the compact component of the string's target space, and there is a huge number of additional data or boundary conditions, known as fluxes and branes, that are necessary to uniquely specify string theory in four dimensions. Early estimates argue that there are \(\mathcal{O}(10^{500})\) boundary data choices for any typical six dimensional
compactification space [\textit{S. K. Ashok} and \textit{M. R. Douglas}, J. High Energy Phys. 2004, No. 1, 060, 36 p. (2004; Zbl 1243.83060)] . Estimates on the entire landscape are even much larger, \(\mathcal{O}(10^{272,000})\) [\textit{W. Taylor} and \textit{Y.-N. Wang}, J. High Energy Phys. 2015, No. 12, Paper No. 164, 21 p. (2015; Zbl 1388.81367)]. In addition, finding mathematically consistent and phenomenologically viable
background configurations requires solving problems which are generically NP-complete, NP-hard, or even undecidable.
The paper under review consists of two parts. In sections 2 to 9, the author introduces concepts of data science that are
relevant for string theory studies. This introduction is general and does not make reference to string theory concepts.
Sections 2 to 4 introduce neural networks (NNs) and section 5 describes genetic algorithms. Section 6 describes persistent homology as an example for topological data analysis. Section 7 describes machine learning algorithms other than NNs that can be used in unsupervised machine learning
to cluster data or detect outliers and anomalies in a data set. After explaining a general problem that occurs
in all these algorithms the author introduces common algorithms such as principal component analysis, \(K\)-means clustering, mean shift clustering, Gaussian expectation-maximization clustering, and clustering with BIRCH and with DBSCAN. Section 8 introduces reinforcement learning to search for solutions in a large space of possibilities, and finally section 9 discusses classification and regression algorithms besides NNs that can be used in supervised machine learning. The algorithms discussed are the \(k\)-nearest neighbor algorithm, decision trees and random
forests, and support vector machines.
In Section 10, the author explains the hardness of the problems encountered
in string theory, reviews the existing machine learning literature and illustrates applications of the techniques explained in section 2 to 9 to problems that arise in string theory. This includes computing cohomologies of line bundles over Calabi-Yau manifolds, generating and proving conjectures based on
observations made by the AI in some data sets, predicting the types of non-Higgsable
gauge groups that appear in F-Theory on toric, elliptically fibered Calabi-Yau fourfolds, and generating superpotentials for \(4D\) \(\mathcal{N} = 1\) theories, studying the structure of string vacua and searching through the landscape of string vacua to identify viable models. The author also illustrates the use of genetic algorithms to distinguish high-scale SUSY breaking models, and the use of convolutional neural networks for toric diagrams to predict volumes of Sasaki-Einstein manifolds. Furthermore the author introduces the idea of using NNs to approximate the bulk metric in AdS/CFT, and discusses the deep Boltzmann machines and their relation to AdS/CFT and Riemann Theta functions.
Farhang Loran (Isfahan)
Zbl 1243.83060; Zbl 1388.81367