an:06253985
Zbl 1280.68305
Hon, Wing-Kai; Ku, Tsung-Han; Shah, Rahul; Thankachan, Sharma V.; Vitter, Jeffrey Scott
Compressed text indexing with wildcards
EN
J. Discrete Algorithms 19, 23-29 (2013).
00328875
2013
j
68W32 68R05 68P15 68U15
approximate pattern matching; wildcards; compressed text indexing
Summary: Let \(T=T_1\phi^{k_1}T_2\phi^{k_2}\cdots\phi^{k_d}T_{d+1}\) be a text of total length \(n\), where characters of each \(T_i\) are chosen from an alphabet \(\Sigma\) of size \(\sigma\), and \(\phi\) denotes a wildcard symbol. The text indexing with wildcards problem is to index \(T\) such that when we are given a query pattern \(P\), we can locate the occurrences of \(P\) in \(T\) efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently, \textit{A. Tam} et al. [``Succinct text indexing with wildcards'', in: SPIRE 2009, 39--50 (2009)] and \textit{C. Thachuk} [Lect. Notes Comput. Sci. 6661, 27--40 (2011; Zbl 1339.68339)] have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only \(nH_h+o(n\log\sigma)+O(d\log n)\) bits of space, where \(H_h\) is the \(h\)th-order empirical entropy \((h=o(\log_\sigma n))\) of \(T\).
Zbl 1339.68339