×

A coded shared atomic memory algorithm for message passing architectures. (English) Zbl 1404.68023

Summary: This paper considers the communication and storage costs of emulating atomic (linearizable) multi-writer multi-reader shared memory in distributed message-passing systems. The paper contains three main contributions: (1) we present an atomic shared-memory emulation algorithm that we call Coded Atomic Storage (CAS). This algorithm uses erasure coding methods. In a storage system with \(N\) servers that is resilient to \(f\) server failures, we show that the communication cost of CAS is \(\frac{N}{N-2f}\). The storage cost of CAS is unbounded. (2) We present a modification of the CAS algorithm known as CAS with garbage collection (CASGC). The CASGC algorithm is parameterized by an integer \(\delta \) and has a bounded storage cost. We show that the CASGC algorithm satisfies atomicity. In every execution of CASGC where the number of server failures is no bigger than \(f\), we show that every write operation invoked at a non-failing client terminates. We also show that in an execution of CASGC with parameter \(\delta \) where the number of server failures is no bigger than \(f\),  a read operation terminates provided that the number of write operations that are concurrent with the read is no bigger than \(\delta \). We explicitly characterize the storage cost of CASGC, and show that it has the same communication cost as CAS. (3) We describe an algorithm known as the Communication Cost Optimal Atomic Storage (CCOAS) algorithm that achieves a smaller communication cost than CAS and CASGC. In particular, CCOAS incurs read and write communication costs of \(\frac{N}{N-f}\) measured in terms of number of object values. We also discuss drawbacks of CCOAS as compared with CAS and CASGC.

MSC:

68M14 Distributed systems
68M12 Network protocols
68P20 Information storage and retrieval of data
68P25 Data encryption (aspects in computer science)

Software:

PoWerStore
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Common RAID disk data format specification. SNIA, Advanced Storage and Information Technology Standard, version 2 (2009)
[2] Abd-El-Malek, M., Ganger, G.R., Goodson, G.R., Reiter, M.K., Wylie, J.J.: Fault-scalable byzantine fault-tolerant services. ACM SIGOPS Oper. Syst. Rev. 39, 59-74 (2005) · doi:10.1145/1095809.1095817
[3] Agrawal, A., Jalote, P.: Coding-based replication schemes for distributed systems. IEEE Trans. Parallel Distrib. Syst. 6(3), 240-251 (1995). doi:10.1109/71.372774 · doi:10.1109/71.372774
[4] Aguilera, M.K., Janakiraman, R., Xu, L.: Using erasure codes efficiently for storage in a distributed system. In: Proceedings of International Conference on Dependable Systems and Networks (DSN), pp. 336-345. IEEE (2005)
[5] Aguilera, M.K., Keidar, I., Malkhi, D., Shraer, A.: Dynamic atomic storage without consensus. J. ACM 58, 7:1-7:32 (2011). doi:10.1145/1944345.1944348 · Zbl 1327.68093 · doi:10.1145/1944345.1944348
[6] Anderson, E., Li, X., Merchant, A., Shah, M.A., Smathers, K., Tucek, J., Uysal, M., Wylie, J.J.: Efficient eventual consistency in pahoehoe, an erasure-coded key-blob archive. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 181-190. IEEE (2010) · Zbl 1029.68528
[7] Androulaki, E.; Cachin, C.; Dobre, D.; Vukolić, M.; Aguilera, MK (ed.); etal., Erasure-coded byzantine storage with separate metadata, 76-90 (2014), New York
[8] Attiya, H., Bar-Noy, A., Dolev, D.: Sharing memory robustly in message-passing systems. J. ACM (JACM) 42(1), 124-142 (1995) · Zbl 0886.68018 · doi:10.1145/200836.200869
[9] Cachin, C.; Tessaro, S.; Fraigniaud, P. (ed.), Asynchronous verifiable information dispersal, 503-504 (2005), Berlin, Heidelberg
[10] Cachin, C., Tessaro, S.: Optimal resilience for erasure-coded byzantine distributed storage. In: 2006 International Conference on Dependable Systems and Networks (DSN), pp. 115-124. IEEE (2006)
[11] Cadambe, V.R., Lynch, N., Medard, M., Musial, P.: A coded shared atomic memory algorithm for message passing architectures. In: 13th International Symposium on Network Computing and Applications (NCA), pp. 253-260. IEEE (2014) · Zbl 1404.68023
[12] Cassuto, Y.: What can coding theory do for storage systems? ACM SIGACT News 44(1), 80-88 (2013) · doi:10.1145/2447712.2447734
[13] Datta, A., Oggier, F.: An overview of codes tailor-made for better repairability in networked distributed storage systems. ACM SIGACT News 44(1), 89-105 (2013) · doi:10.1145/2447712.2447735
[14] Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., Vukolić, M.: PoWerStore: proofs of writing for efficient and robust storage. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications security, pp. 285-298. ACM (2013) · Zbl 1161.68334
[15] Dutta, P.; Guerraoui, R.; Levy, RR; Taubenfeld, G. (ed.), Optimistic erasure-coded distributed storage, 182-196 (2008), New York · Zbl 1161.68334
[16] Fan, R., Lynch, N.: Efficient replication of large data objects. In: Proceedings of the 17th International Symposium on Distributed Computing (DISC), pp. 75-91 (2003) · Zbl 1180.68055
[17] Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171-216 (2001). doi:10.1145/377769.377776 · Zbl 1374.68069
[18] Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of the Seventh ACM Symposium on Operating Systems Principles, SOSP ’79, pp. 150-162. ACM, New York (1979). doi:10.1145/800215.806583 · Zbl 0677.68067
[19] Gilbert, S., Lynch, N., Shvartsman, A.: RAMBO: a robust, reconfigurable atomic memory service for dynamic networks. Distrib. Comput. 23(4), 225-272 (2010) · Zbl 1231.68077 · doi:10.1007/s00446-010-0117-1
[20] Goodson, G.R., Wylie, J.J., Ganger, G.R., Reiter, M.K.: Efficient byzantine-tolerant erasure-coded storage. In: 2004 International Conference on Dependable Systems and Networks, pp. 135-144. IEEE (2004)
[21] Hendricks, J., Ganger, G.R., Reiter, M.K.: Low-overhead Byzantine fault-tolerant storage. In: Proceedings of the Seventh ACM Symposium on Operating Systems Principles (SOSP) vol. 41, no. 6, pp. 73-86 (2007)
[22] Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 463-492 (1990). doi:10.1145/78969.78972 · doi:10.1145/78969.78972
[23] Lamport, L.: On interprocess communication. Part I: basic formalism. Distrib. Comput. 2(1), 77-85 (1986) · Zbl 0598.68022 · doi:10.1007/BF01786227
[24] Lin, S., Costello, D.J.: Error Control Coding, 2nd edn. Prentice-Hall, Upper Saddle River (2004) · Zbl 1310.94180
[25] Lynch, N., Shvartsman, A.: Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. In: Twenty-Seventh Annual International Symposium on Fault-Tolerant Computing, FTCS-27. Digest of Papers, pp. 272-281. IEEE (1997)
[26] Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann Publishers, San Francisco (1996) · Zbl 0877.68061
[27] Lynch, N.A., Tuttle, M.R.: An introduction to input/output automata. CWI Q. 2, 219-246 (1989) · Zbl 0677.68067
[28] Malkhi, D., Reiter, M.: Byzantine quorum systems. Distrib. Comput. 11(4), 203-213 (1998). doi:10.1007/s004460050050 · Zbl 0963.68051 · doi:10.1007/s004460050050
[29] Martin, JP; Alvisi, L.; Dahlin, M.; Malkhi, D. (ed.), Minimal byzantine storage, 311-325 (2002), New York · Zbl 1029.68528
[30] Plank, J.S.: T1: erasure codes for storage applications. In: Proceedings of the 4th USENIX Conference on File and Storage Technologies, pp. 1-74 (2005)
[31] Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300-304 (1960) · Zbl 0099.34403 · doi:10.1137/0108018
[32] Roth, R.: Introduction to Coding Theory. Cambridge University Press, Cambridge (2006) · Zbl 1092.94001 · doi:10.1017/CBO9780511808968
[33] Saito, Y., Frølund, S., Veitch, A., Merchant, A., Spence, S.: Fab: building distributed enterprise disk arrays from commodity components. In: ACM SIGARCH Computer Architecture News, vol. 32, pp. 48-58. ACM (2004)
[34] Thomas, R.: A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Database Syst. 4(2), 180-209 (1979) · doi:10.1145/320071.320076
[35] Vukolić, M.: Quorum systems: with applications to storage and consensus. Synth. Lect. Distrib. Comput. Theory 3(1), 1-146 (2012). doi:10.2200/S00402ED1V01Y201202DCT009 · doi:10.2200/S00402ED1V01Y201202DCT009
[36] Wang, Z., Cadambe, V.R.: Multi-version coding in distributed storage. In: 2014 IEEE International Symposium on Information Theory (ISIT) (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.