×

The modern algebra of information retrieval. (English) Zbl 1149.68025

The Information Retrieval Series 24. Berlin: Springer (ISBN 978-3-540-77658-1/hbk). xiv, 327 p. (2008).
The author underlines the topicality of the theme by his finding that “the number of information retrieval books that have appeared in the last ten years is 70% of the total number published on the subject in all thus far”. The present volume differs from all the books in both approach and style. Retrieval methods and information retrieval in general are here treated in a unified manner within abstract algebraic structures – primarily lattices, but also linear space, clans, and algebras. The author believes that “in theoretical as well as practical applications, lattices have certain additional properties that are of real interest and use”.
After a brief survey of information retrieval history, elements of mathematical logic, sets, relations and lattice theory are presented. Basics of information retrieval technology including the ideas of relevance effectiveness and search engine effectiveness measurement are given in the next chapter.
The fifth chapter of the book surveys some previous papers on lattice-based information retrieval and the corresponding systems. It concentrates on the period 2000–2006 but adduces also the theoretical paper of 1959, “A mathematical theory of language symbols in retrieval” [in: Proceedings of the International Conference on Scientific Information – Two Volumes (1959), http://books.nap.edu/openbook.php?record_id=10866&page=1327] by C. N. Mooers, who “seems to have been the first individual to offer a detailed and comprehensive treatment of the application of the lattice concept in information retrieval”.
Individual chapters are then devoted to a short exposition of the Boolean retrieval and to a detailed study of the vector space, fuzzy algebra-based, and probabilistic retrieval. In each of these chapters the reader finds also paragraphs presenting the elements of the corresponding theories (linear, Banach etc. spaces, tensor algebra, fuzzy sets, probability), which makes the book to a certain extent self-contained.
The book ends with its longest chapter, devoted to web retrieval and ranking where also results of the preceding chapters are used, e.g. the method using the notions of fuzzy algebra and fuzzy probability, and an aggregated method based on lattices that allows one to calculate both the link importance of pages (including impact factor, mutual citation etc.) and their intrinsic importance.
Experiments are performed to estimate the relevance effectiveness of the fuzzy entropy and fuzzy probability retrieval methods. Experimental evidence for the relevance effectiveness of the aggregated method is given in terms of comparison with commercial search engines (with Google, AltaVista, Yahoo!).
Each chapter is supplemented with exercises and problems (with hints to solutions in the appendix) indicating also the application possibilities of the results obtained. The book is beneficial both for better understanding the existing information retrieval methods and for the creation of new ones.

MSC:

68P20 Information storage and retrieval of data
68-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science
PDF BibTeX XML Cite