×

The impossibility of boosting distributed service resilience. (English) Zbl 1228.68015

Summary: We study \(f\)-resilient services, which are guaranteed to operate as long as no more than \(f\) of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using \(f\)-resilient atomic objects and reliable registers can solve the consensus problem in the presence of \(f+1\) undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set-consensus problem is solvable for \(2n\) processes and \(2n-1\) failures (i.e., wait-free) using \(n\)-process consensus services resilient to \(n-1\) failures (wait-free). Our proof is short and self-contained.
We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services.
Our third theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, \(f+1\) process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors.
As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects.

MSC:

68M14 Distributed systems
68M15 Reliability, testing and fault tolerance of networks and computer systems
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] P. Attie, R. Guerraoui, P. Kouznetsov, N.A. Lynch, S. Rajsbaum, The impossibility of boosting distributed service resilience, in: The 25th International Conference on Distributed Computing Systems, 2005.; P. Attie, R. Guerraoui, P. Kouznetsov, N.A. Lynch, S. Rajsbaum, The impossibility of boosting distributed service resilience, in: The 25th International Conference on Distributed Computing Systems, 2005. · Zbl 1228.68015
[2] P.C. Attie, N.A. Lynch, S. Rajsbaum, Boosting Fault-tolerance in Asynchronous Message Passing Systems is Impossible, Technical Report, MIT Laboratory for Computer Science, MIT-LCS-TR-877, 2002. Available from: <http://theory.lcs.mit.edu/tds/reflist.html/>; P.C. Attie, N.A. Lynch, S. Rajsbaum, Boosting Fault-tolerance in Asynchronous Message Passing Systems is Impossible, Technical Report, MIT Laboratory for Computer Science, MIT-LCS-TR-877, 2002. Available from: <http://theory.lcs.mit.edu/tds/reflist.html/>
[3] Chandra, T.; Hadzilacos, V.; Jayanti, P.; Toueg, S., Generalized irreducibility of consensus and the equivalence of \(t\)-resilient and wait-free implementations of consensus, SIAM Journal on Computing, 34, 2, 333-357 (2005), (Conference version appears in PODC94) · Zbl 1087.68124
[4] Chandra, T. D.; Hadzilacos, V.; Toueg, S., The weakest failure detector for solving consensus, Journal of the ACM, 43, 4, 685-722 (1996) · Zbl 0885.68022
[5] Chandra, T. D.; Toueg, S., Unreliable failure detectors for reliable distributed systems, Journal of the ACM, 43, 2, 225-267 (1996) · Zbl 0885.68021
[6] S. Chaudhuri. Agreement is harder than consensus: set consensus in totally asynchronous systems, in: Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC’00), August 1990, pp. 311-324.; S. Chaudhuri. Agreement is harder than consensus: set consensus in totally asynchronous systems, in: Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC’00), August 1990, pp. 311-324.
[7] C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, A realistic look at failure detectors, in: IEEE Symposium on Dependable Systems and Networks (DSN 2002), Washington, DC, June 2002.; C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, A realistic look at failure detectors, in: IEEE Symposium on Dependable Systems and Networks (DSN 2002), Washington, DC, June 2002. · Zbl 1029.68520
[8] Fischer, M. J.; Lynch, N. A.; Paterson, M. S., Impossibility of distributed consensus with one faulty process, Journal of the ACM, 32, 3, 374-382 (1985) · Zbl 0629.68027
[9] Guerraoui, R.; Kouznetsov, P., Failure detectors as type boosters, Distributed Computing, 20, 5, 343-358 (2008) · Zbl 1266.68053
[10] V. Hadzilacos, S. Toueg, A Modular Approach to Fault-tolerant Broadcast and Related Problems, Technical Report, Cornell University, Computer Science, 1994.; V. Hadzilacos, S. Toueg, A Modular Approach to Fault-tolerant Broadcast and Related Problems, Technical Report, Cornell University, Computer Science, 1994.
[11] Herlihy, M., Wait-free synchronization, ACM Transactions on Programming Languages and Systems, 13, 1, 124-149 (1991)
[12] Herlihy, M.; Wing, J. M., Linearizability: a correctness condition for concurrent objects, ACM Transactions on Programming Languages and Systems, 12, 3, 463-492 (1990)
[13] P. Jayanti, Private communication, 2003.; P. Jayanti, Private communication, 2003.
[14] P. Jayanti, S. Toueg, Some results on the impossibility, universability and decidability of consensus, in: Proceedings of the 6th International Workshop on Distributed Algorithms (WDAG’92), LNCS, vol. 647, Springer-Verlag, 1992.; P. Jayanti, S. Toueg, Some results on the impossibility, universability and decidability of consensus, in: Proceedings of the 6th International Workshop on Distributed Algorithms (WDAG’92), LNCS, vol. 647, Springer-Verlag, 1992.
[15] Loui, M. C.; Abu-Amara, H. H., Memory requirements for agreement among unreliable asynchronous processes, Advances in Computing Research, 163-183 (1987)
[16] Lynch, N.; Tuttle, M., An introduction to input/output automata, CWI-Quarterly, 2, 3, 219-246 (1989), (Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands) · Zbl 0677.68067
[17] Lynch, N. A., Distributed Algorithms (1996), Morgan Kaufman Publishers · Zbl 0877.68061
[18] N.A. Lynch, M.R. Tuttle, Hierarchical correctness proofs for distributed algorithms, in: PODC, 1987, pp. 137-151.; N.A. Lynch, M.R. Tuttle, Hierarchical correctness proofs for distributed algorithms, in: PODC, 1987, pp. 137-151.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.