Wang, Jiaying; Wang, Bin; Yang, Xiaochun Efficient compressed genomic data oriented query approach. (Chinese. English summary) Zbl 1363.68067 J. Softw. 27, No. 7, 1715-1728 (2016). Summary: With the rapid development of the third and next generation sequencing techniques, genomic sequences such as DNA grow explosively. Processing such big data efficiently is a great challenge. Research found that although those sequences are very large, they are highly similar to each other. Thus it is feasible to reduce the space cost by storing their differences to a reference sequence. New studies show that it is possible to directly search on the compressed data. The target of this study is to improve the scalability of the indexing and searching techniques to meet the growing demand of big data. Based on the existing method, this work is placed on compressing the reference sequence. Several optimization techniques are proposed to perform efficient, exact and approximate search with arbitrary query length on the compressed data. The process is further improved by utilizing parallel computing to increase the query efficiency for big data. Experimental study demonstrates the efficiency of the proposed method. MSC: 68P20 Information storage and retrieval of data 68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) 92D20 Protein sequences, DNA sequences Keywords:genomic sequence; big data; scalability; data compression; parallel computing PDFBibTeX XMLCite \textit{J. Wang} et al., J. Softw. 27, No. 7, 1715--1728 (2016; Zbl 1363.68067) Full Text: DOI