zbMATH — the first resource for mathematics

Parallelization of a finite element surface fitting algorithm for data mining. (English) Zbl 0977.65012
Summary: A major task in data mining is to develop automatic techniques to process and to detect patterns in very large data sets. An important data mining technique is multivariate regression, and an essential sub task is the estimation of interaction surfaces, i.e. the estimation of functions of two variables. Thin plate splines provide a very good method to determine an approximating surface. Obtaining standard thin plate splines requires the solution of a dense linear system of equations of order \(n\), where \(n\) is the number of observations.
Standard thin plate splines may not be practical, because the number of observations for data mining applications is often in the millions. We have developed a finite element approximation of a spline that can handle data sizes with millions of records. The resolution of the finite element method can be chosen independently from the number of observations. The observation data is read from secondary storage once, and does not need to be stored in memory. In this paper, we present a first parallel implementation of this method in an MPI environment.
65D10 Numerical smoothing, curve fitting
65Y05 Parallel numerical computation
68T10 Pattern recognition, speech recognition
65C60 Computational problems in statistics (MSC2010)
62J05 Linear regression; mixed models