More haste, less waste: lowering the redundancy in fully indexable dictionaries.

*(English)*Zbl 1236.68064
Albers, Susanne (ed.) et al., STACS 2009. 26th international symposium on theoretical aspects of computer science, Freiburg, Germany, February 26–28, 2009. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik (ISBN 978-3-939897-09-5). LIPIcs – Leibniz International Proceedings in Informatics 3, 517-528, electronic only (2009).

Summary: We consider the problem of representing, in a compressed format, a bit-vector \(S\) of \(m\) bits with \(n\) \(\mathbf{1}\)s, supporting the following operations, where \(b \in \{ \mathbf{0}, \mathbf{1} \}\):

(1) \(\text{rank}_b(S,i)\) returns the number of occurrences of bit \(b\) in the prefix \(S\left[1..i\right]\);

(2) \(\text{select}_b(S,i)\) returns the position of the \(i\)th occurrence of bit \(b\) in \(S\).

Such a data structure is called fully indexable dictionary (FID), and is at least as powerful as predecessor data structures. Viewing \(S\) as a set \(X = \{ x_1, x_2, \ldots, x_n \}\) of \(n\) distinct integers drawn from a universe \([m] = \{1, \ldots, m\}\), the predecessor of integer \(y \in [m]\) in \(X\) is given by \(\text{select}_1(S, \text{rank}_1(S,y-1))\). FIDs have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types.

Our focus is on space-efficient FIDs on the RAM model with word size \(\Theta(\log m)\) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring \(S\) to be encoded, having length \(m\) and containing \(n\) ones, the minimal amount of information that needs to be stored is \(B(n,m) = \lceil \log {{m}\choose{n}} \rceil\). The state of the art in building a FID for \(S\) is given in [M. Pǎtraşcu, “Succinter”, in: FOCS’08. Proceedings of the 49th annual IEEE symposium on foundations of computer science. 305–313 (2008)] using \(B(m,n)+O( m / ( (\log m/ t) ^t) ) + O(m^{3/4}) \) bits, to support the operations in \(O(t)\) time.

Here we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants \(0 < \delta \leq 1/2, 0 < \varepsilon \leq 1\), and integer \(s > 0\), it uses \[ B(n,m) + O\left(n^{1+\delta} + n \left(\frac{m}{n^s}\right)^\varepsilon\right) \] bits and performs all the operations in time \(O(s\delta^{-1} + \varepsilon^{-1})\). The improvement is twofold: our redundancy can be lowered parametrically and, fixing \(s = O(1)\), we get a constant-time FID whose space is \(B(n,m) + O(m^\varepsilon/\text{poly}(n))\) bits, for sufficiently large \(m\). This is a significant improvement compared to the previous bounds for the general case.

For the entire collection see [Zbl 1213.68019].

(1) \(\text{rank}_b(S,i)\) returns the number of occurrences of bit \(b\) in the prefix \(S\left[1..i\right]\);

(2) \(\text{select}_b(S,i)\) returns the position of the \(i\)th occurrence of bit \(b\) in \(S\).

Such a data structure is called fully indexable dictionary (FID), and is at least as powerful as predecessor data structures. Viewing \(S\) as a set \(X = \{ x_1, x_2, \ldots, x_n \}\) of \(n\) distinct integers drawn from a universe \([m] = \{1, \ldots, m\}\), the predecessor of integer \(y \in [m]\) in \(X\) is given by \(\text{select}_1(S, \text{rank}_1(S,y-1))\). FIDs have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types.

Our focus is on space-efficient FIDs on the RAM model with word size \(\Theta(\log m)\) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring \(S\) to be encoded, having length \(m\) and containing \(n\) ones, the minimal amount of information that needs to be stored is \(B(n,m) = \lceil \log {{m}\choose{n}} \rceil\). The state of the art in building a FID for \(S\) is given in [M. Pǎtraşcu, “Succinter”, in: FOCS’08. Proceedings of the 49th annual IEEE symposium on foundations of computer science. 305–313 (2008)] using \(B(m,n)+O( m / ( (\log m/ t) ^t) ) + O(m^{3/4}) \) bits, to support the operations in \(O(t)\) time.

Here we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants \(0 < \delta \leq 1/2, 0 < \varepsilon \leq 1\), and integer \(s > 0\), it uses \[ B(n,m) + O\left(n^{1+\delta} + n \left(\frac{m}{n^s}\right)^\varepsilon\right) \] bits and performs all the operations in time \(O(s\delta^{-1} + \varepsilon^{-1})\). The improvement is twofold: our redundancy can be lowered parametrically and, fixing \(s = O(1)\), we get a constant-time FID whose space is \(B(n,m) + O(m^\varepsilon/\text{poly}(n))\) bits, for sufficiently large \(m\). This is a significant improvement compared to the previous bounds for the general case.

For the entire collection see [Zbl 1213.68019].