an:06400034
Zbl 1330.68054
Biswas, Sudip; Patil, Manish; Shah, Rahul; Thankachan, Sharma V.
Succinct indexes for reporting discriminating and generic words
EN
Moura, Edleno (ed.) et al., String processing and information retrieval. 21st international symposium, SPIRE 2014, Ouro Preto, Brazil, October 20--22, 2014. Proceedings. Berlin: Springer (ISBN 978-3-319-11917-5/pbk). Lecture Notes in Computer Science 8799, 89-100 (2014).
2014
a
68P15 68P05 68P20 68W32
Summary: We consider the problem of indexing a collection \(\mathcal{D}\) of \(D\) strings (documents) of total \(n\) characters from an alphabet set of size \(\sigma \), such that whenever a pattern \(P\) (of \(p\) characters) and an integer \(\tau \in [1, D]\) comes as a query, we can efficiently report all (i) maximal generic words and (ii) minimal discriminating words as defined below:
\(\bullet\) maximal generic word is a maximal extension of \(P\) occurring in at least \(\tau \) documents..
\(\bullet\) minimal discriminating word is a minimal extension of \(P\) occurring in at most \(\tau \) documents.
These problems were introduced by \textit{G. Kucherov} et al. [Lect. Notes Comput. Sci. 7608, 307--317 (2012; Zbl 1330.68059)], and they proposed linear space indexes occupying \(O(n\log n)\) bits with query times \(O(p + \mathrm{output})\) and \(O\)(p + \(\log \log n + \mathrm{output})\) for Problem (i) and Problem (ii) respectively. In this paper, we describe succinct indexes of \(n \log \sigma + o(n \log \sigma ) + O(n)\) bits space with near optimal query times i.e., \(O(p + \log \log n + \mathrm{output})\) for both these problems.
For the entire collection see [Zbl 1298.68032].
Zbl 1330.68059