×

Computing the Burrows-Wheeler transform in place and in small space. (English) Zbl 1328.68325

Summary: We introduce the problem of computing the Burrows-Wheeler transform (BWT) using small additional space. Our in-place algorithm does not need the explicit storage for the suffix sort array and the output array, as typically required in previous work. It relies on the combinatorial properties of the BWT, and runs in \(O(n^2)\) time in the comparison model using \(O(1)\) extra memory cells, apart from the array of \(n\) cells storing the \(n\) characters of the input text. We then discuss the time-space trade-off when \(O(k \cdot \sigma_k)\) extra memory cells are allowed with \(\sigma_k\) distinct characters, providing an \(O((n^2 / k + n) \log k)\)-time algorithm to obtain (and invert) the BWT. For example in real systems where the alphabet size is a constant, for any arbitrarily small \(\varepsilon > 0\), the BWT of a text of \(n\) bytes can be computed in \(O(n \varepsilon^{- 1} \log n)\) time using just \(\varepsilon n\) extra bytes.

MSC:

68W32 Algorithms on strings
68Q25 Analysis of algorithms and problem complexity

Software:

BWA; Find
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adjeroh, Donald; Bell, Timothy; Mukherjee, Amar, The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching (2008), Springer
[2] Aho, Alfred; Hopcroft, John; Ullman, Jeff D., The Design and Analysis of Computer Algorithms (1974), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0326.68005
[3] Belazzougui, Djamal, Linear time construction of compressed text indices in compact space, (Proc. STOC 2014 (2014)), 148-193 · Zbl 1315.68110
[4] Burrows, Michael; Wheeler, David J., A block-sorting lossless data compression algorithm (May 1994), Digital SRC: Digital SRC Palo Alto, CA, USA, Research Report 124
[5] Chan, Timothy M., Comparison-based time-space lower bounds for selection, ACM Trans. Algorithms, 6, 2, 1-16 (2010) · Zbl 1300.68032
[6] Crochemore, Maxime; Grossi, Roberto; Kärkkäinen, Juha; Landau, Gad M., A constant-space comparison-based algorithm for computing the Burrows-Wheeler transform, (Combinatorial Pattern Matching, 24th Annual Symposium (CPM) (2013)), 74-82 · Zbl 1381.68313
[7] Dobkin, David J.; Munro, J. Ian, Optimal time minimal space selection algorithms, J. ACM, 28, 3, 454-461 (July 1981)
[8] Ferragina, Paolo; Manzini, Giovanni, Indexing compressed text, J. ACM, 52, 4, 552-581 (2005) · Zbl 1323.68261
[9] Franceschini, Gianni; Muthukrishnan, S., In-place suffix sorting, automata, languages and programming, (34th International Colloquium (ICALP) (2007)), 533-545 · Zbl 1171.68445
[10] Grossi, Roberto; Gupta, Ankur; Vitter, Jeffrey S., High-order entropy-compressed text indexes, (ACM-SIAM SODA (2003)), 841-850 · Zbl 1092.68584
[11] Grossi, Roberto; Ottaviano, Giuseppe, The wavelet trie: maintaining an indexed sequence of strings in compressed space, (ACM PODS (2012)), 203-214
[12] Hoare, C. A.R., Algorithm 65: find, Commun. ACM, 4, 7, 321-322 (July 1961)
[13] Hon, Wing-Kai; Lam, Tak Wah; Sadakane, Kunihiko; Sung, Wing-Kin; Yiu, Siu-Ming, A space and time efficient algorithm for constructing compressed suffix arrays, Algorithmica, 48, 1, 23-36 (2007) · Zbl 1123.68137
[14] Hon, Wing-Kai; Sadakane, Kunihiko; Sung, Wing-Kin, Breaking a time-and-space barrier in constructing full-text indices, SIAM J. Comput., 38, 6, 2162-2178 (2009) · Zbl 1191.68225
[15] Kärkkäinen, Juha, Fast BWT in small space by blockwise suffix sorting, Theor. Comput. Sci., 387, 3, 249-257 (2007) · Zbl 1144.68021
[16] Lam, T. W.; Li, Ruiqiang; Tam, Alan; Wong, Simon; Wu, Edward; Yiu, S. M., High throughput short read alignment via bi-directional BWT, (IEEE International Conference on Bioinformatics and Biomedicine (2009)), 31-36
[17] Lam, T. W.; Sung, W. K.; Tam, S. L.; Wong, C. K.; Yiu, S. M., Compressed indexing and local alignment of DNA, Bioinformatics, 24, 6, 791-797 (2015)
[18] Li, Heng; Durbin, Richard, Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics, 26, 5, 589-595 (2010)
[19] Langmead, Ben; Trapnell, Cole; Pop, Mihai; Salzberg, Steven L., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol., 10, 3, R25 (2009)
[20] Manber, Udi; Myers, Gene, Suffix arrays: a new method for on-line string searches, SIAM J. Comput., 22, 5, 935-948 (October 1993)
[21] Manzini, Giovanni, An analysis of the Burrows-Wheeler transform, J. ACM, 48, 3, 407-430 (2001) · Zbl 1323.68262
[22] Munro, J. Ian, Tables, (Chandru, V.; Vinay, V., Proc. of Foundations of Software Technology and Theoretical Computer Science, 16th Conference. Proc. of Foundations of Software Technology and Theoretical Computer Science, 16th Conference, Hyderabad, India, December 18-20, 1996. Proc. of Foundations of Software Technology and Theoretical Computer Science, 16th Conference. Proc. of Foundations of Software Technology and Theoretical Computer Science, 16th Conference, Hyderabad, India, December 18-20, 1996, Lect. Notes Comput. Sci., vol. 1180 (1996), Springer), 37-42
[23] Munro, J. Ian; Raman, Venkatesh, Selection from read-only memory and sorting with minimum data movement, Theor. Comput. Sci., 165, 2, 311-323 (1996) · Zbl 0872.68045
[24] Na, Joong Chae; Park, Kunsoo, Alphabet-independent linear-time construction of compressed suffix arrays using o \((n \log n)\)-bit working space, Theor. Comput. Sci., 385, 1-3, 127-136 (2007) · Zbl 1125.68041
[25] Navarro, Gonzalo, Wavelet trees for all, J. Discrete Algorithms, 25, 2-20 (2014) · Zbl 1284.68217
[26] Navarro, Gonzalo; Nekrich, Yakov, Optimal dynamic sequence representations, (ACM-SIAM SODA (2013)), 865-876 · Zbl 1320.68060
[27] Okanohara, Daisuke; Sadakane, Kunihiko, A linear-time Burrows-Wheeler transform using induced sorting, (16th Symposium on String Processing and Information Retrieval (SPIRE). 16th Symposium on String Processing and Information Retrieval (SPIRE), Lect. Notes Comput. Sci., vol. 5721 (2009), Springer), 90-101
[28] Raman, Venkatesh; Ramnath, Sarnath, Improved upper bounds for time-space trade-offs for selection, Nord. J. Comput., 6, 2, 162-180 (1999) · Zbl 0946.68156
[29] Salson, Mikael; Lecroq, Thierry; Léonard, Martine; Mouchard, Laurent, A four-stage algorithm for updating a Burrows-Wheeler transform, Theor. Comput. Sci., 410, 43, 4350-4359 (2009) · Zbl 1187.68685
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.