×

A directed graph method for building high dimension data cube. (Chinese. English summary) Zbl 1413.68034

Summary: Hadoop, Spark and other software frameworks provide technical support for parallel and fast processing of big data. At the same time, the big data environment also puts forward the requirements of quasi-real-time and real-time response to OLAP (On-line Analytical Processing). Data cube is the abstraction of multidimensional data model for OLAP. The variability analysis of large data makes the data cube high dimensional features, the amount of large data also causes the expansion of the data cube. A digraph is used to describe a data cube, which can provide a complete set of data pieces and data blocks for data analysis, and improve the efficiency of data analysis by extracting an element in the complete set. For a high dimensional data cube, a dimension reduction method is used to control the size of the cube. Based on the frequency and mode of using each dimension, the concepts of nondimension, necessary dimension, and joint dimensions are proposed. The methods of judging all kinds of dimensions are given, and a simplified method of adjusting the data cube is implemented.

MSC:

68P15 Database theory
68R10 Graph theory (including graph drawing) in computer science
68T05 Learning and adaptive systems in artificial intelligence

Software:

Hadoop
PDFBibTeX XMLCite
Full Text: DOI