zbMATH — the first resource for mathematics

Space-efficient dictionaries for parameterized and order-preserving pattern matching. (English) Zbl 1380.68472
Grossi, Roberto (ed.) et al., 27th annual symposium on combinatorial pattern matching, CPM 2016, Tel Aviv, Israel, June 27–29, 2016. Proceedings. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik (ISBN 978-3-95977-012-5). LIPIcs – Leibniz International Proceedings in Informatics 54, Article 2, 12 p. (2016).
Summary: Let \(S\) and \(S'\) be two strings of the same length, over a totally-ordered alphabet. We consider the following two variants of string matching.
Parameterized Matching: The characters of \(S\) and \(S'\) are partitioned into static characters and parameterized characters. The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in \(S\) to those in \(S'\).
Order-Preserving Matching: The strings are order-preserving match iff for any two integers \(i,j\in[1,|S|],\;S[i]\prec S[j]\Leftrightarrow S'[i]\prec S'[j]\), where \(\prec\) denotes the precedence order of the alphabet.
Let \({\mathcal P}\) be a collection of \(d\) patterns \(\{P_1,P_2,\dots,P_d\}\) of total length \(n\) characters, which are chosen from a totally-ordered alphabet \(\Sigma\). Given a text \(T\), also over \(\Sigma\), we consider the dictionary indexing problem under the above definitions of string matching. Specifically, the task is to index \({\mathcal P}\), such that we can report all positions \(j\) (called occurences) where at least one of the patterns \(P_i\in{\mathcal P}\) is a parameterized-match (resp. order-preserving match) with the same-length substring of \(T\) starting at \(j\). Previous best-known indexes occupy \(O(n\log n)\) bits and can report all occ occurences in \({\mathcal O}(|T|\log|\Sigma|+\mathrm{occ})\) time. We present space-efficient indexes that occupy \({\mathcal O}(n\log|\Sigma|+ d\log n)\) bits and reports all occ positions in \({\mathcal O}(|T|(\log|\Sigma|+ \log_{|\Sigma|}n)+\mathrm{occ})\) time for parameterized matching and in \({\mathcal O}(|T|\log n+\mathrm{occ})\) time for order-preserving matching.
For the entire collection see [Zbl 1351.68018].

68W32 Algorithms on strings
Full Text: DOI