Succinct indexes for reporting discriminating and generic words. (English) Zbl 1330.68054

Moura, Edleno (ed.) et al., String processing and information retrieval. 21st international symposium, SPIRE 2014, Ouro Preto, Brazil, October 20–22, 2014. Proceedings. Berlin: Springer (ISBN 978-3-319-11917-5/pbk). Lecture Notes in Computer Science 8799, 89-100 (2014).
Summary: We consider the problem of indexing a collection \(\mathcal{D}\) of \(D\) strings (documents) of total \(n\) characters from an alphabet set of size \(\sigma \), such that whenever a pattern \(P\) (of \(p\) characters) and an integer \(\tau \in [1, D]\) comes as a query, we can efficiently report all (i) maximal generic words and (ii) minimal discriminating words as defined below:
\(\bullet\) maximal generic word is a maximal extension of \(P\) occurring in at least \(\tau \) documents..
\(\bullet\) minimal discriminating word is a minimal extension of \(P\) occurring in at most \(\tau \) documents.
These problems were introduced by G. Kucherov et al. [Lect. Notes Comput. Sci. 7608, 307–317 (2012; Zbl 1330.68059)], and they proposed linear space indexes occupying \(O(n\log n)\) bits with query times \(O(p + \mathrm{output})\) and \(O\)(p + \(\log \log n + \mathrm{output})\) for Problem (i) and Problem (ii) respectively. In this paper, we describe succinct indexes of \(n \log \sigma + o(n \log \sigma ) + O(n)\) bits space with near optimal query times i.e., \(O(p + \log \log n + \mathrm{output})\) for both these problems.
For the entire collection see [Zbl 1298.68032].


68P15 Database theory
68P05 Data structures
68P20 Information storage and retrieval of data
68W32 Algorithms on strings


Zbl 1330.68059
Full Text: DOI