×

Algorithm-based fault tolerance for matrix operations. (English) Zbl 0557.68027

A new technique is proposed for achieving high reliability, which is called algorithm based fault tolerance. It is based on data codification. The original algorithm must be redesigned to operate on these encoded data and produce encoded output data. The modified algorithm could take more time to operate on the encoded data, but this time is not excessive. The computation tasks are appropriately distributed among multiple computation units, so that failure of any unit affects only a portion of the data. The error detection and correction schemes must be designed so that a faulty module (which caused erroneous data in the first place) will not mask the error during the detection or correction steps. A method is proposed to detect and correct errors when matrix operations such as addition, multiplication, scalar product, LU-decomposition, and transposition are performed using multiprocessor systems. The number of processors needed to detect errors in matrix multiplication is also studied.
Reviewer: V.Ostianu

MSC:

68N99 Theory of software
90B25 Reliability, availability, maintenance, inspection in operations research
65F99 Numerical linear algebra
65G99 Error analysis and interval analysis
PDFBibTeX XMLCite
Full Text: DOI