an:06457227
Zbl 1322.68071
Hon, Wing-Kai; Ku, Tsung-Han; Lam, Tak-Wah; Shah, Rahul; Tam, Siu-Lung; Thankachan, Sharma V.; Vitter, Jeffrey Scott
Compressing dictionary matching index via sparsification technique
EN
Algorithmica 72, No. 2, 515-538 (2015).
00345260
2015
j
68P30 68P10
data compression; dictionary matching; text indexing; sparsification technique
Summary: Given a set \(\mathcal{D}\) of patterns of total length \(n\), the dictionary matching problem is to index \(\mathcal{D}\) such that for any query text \(T\), we can locate the occurrences of any pattern within \(T\) efficiently. This problem can be solved in optimal \(O(|T|+\mathrm{occ})\) time by the classical AC automaton [\textit{A. V. Aho} and \textit{M. J. Corasick}, Commun. ACM 18, 333--340 (1975; Zbl 0301.68048)], where occ denotes the number of occurrences. The space requirement is \(O(n)\) words which is still far from optimal. In this paper, we show that in many cases, sparsification technique can be applied to improve the space requirements of the indexes for the dictionary matching and its related problems. First, we give a compressed index for dictionary matching, and show that such an index can be generalized to handle dynamic updates of \(\mathcal{D}\). Also, we give a compressed index for approximate dictionary matching with one error. In each case, the query time is only slowed down by a polylogarithmic factor when compared with that achieved by the best \(O(n)\)-word counterparts.
Zbl 0301.68048