Huang, Kuang-Hua; Abraham, Jacob A. Algorithm-based fault tolerance for matrix operations. (English) Zbl 0557.68027 IEEE Trans. Comput. 33, 518-528 (1984). A new technique is proposed for achieving high reliability, which is called algorithm based fault tolerance. It is based on data codification. The original algorithm must be redesigned to operate on these encoded data and produce encoded output data. The modified algorithm could take more time to operate on the encoded data, but this time is not excessive. The computation tasks are appropriately distributed among multiple computation units, so that failure of any unit affects only a portion of the data. The error detection and correction schemes must be designed so that a faulty module (which caused erroneous data in the first place) will not mask the error during the detection or correction steps. A method is proposed to detect and correct errors when matrix operations such as addition, multiplication, scalar product, LU-decomposition, and transposition are performed using multiprocessor systems. The number of processors needed to detect errors in matrix multiplication is also studied. Reviewer: V.Ostianu Cited in 33 Documents MSC: 68N99 Theory of software 90B25 Reliability, availability, maintenance, inspection in operations research 65F99 Numerical linear algebra 65G99 Error analysis and interval analysis Keywords:checksum matrix; error correction; algorithm based fault tolerance; error detection; matrix operations; multiprocessor systems; matrix multiplication PDFBibTeX XMLCite \textit{K.-H. Huang} and \textit{J. A. Abraham}, IEEE Trans. Comput. 33, 518--528 (1984; Zbl 0557.68027) Full Text: DOI