There is interesting discussion of research on word meanings and how learning takes place, and on the kinds of ambiguities inherent in natural language. There is a section on words and their properties, such as category, frequency, contexts, and possibilities of combination. Finally, it discusses the nature of a language corpus and how it is organized, that is, how its components are identified by various kinds of category markers.Įach of the subsequent sections is centered around a single aspect of language. The first section is introductory: it surveys the field as a whole, and then introduces the mathematical and linguistic foundations. The organization of a book on such a vast topic is crucial, and this book derives much of its clarity and usefulness from the fact that its overall structure is extremely clear and rational, even to a reader who has not yet read the book (something not to be taken for granted in a highly technical field). It reaches the right level of detail to make the book useful for reference. It should prove invaluable for finding useful references outside of one's own immediate field. The bibliography is both extensive and timely. The discussion of language in this book is well informed and is based on wide reading of work on natural language in linguistics, computer science, artificial intelligence, and applications such as machine translation, use of electronic corpora of language usage and parsers, and databases of words and syntactic structures (treebanks). It showed me how the concepts that are familiar to me in linguistics are being used as tools for statistical investigation of language. Reading this book was an interesting experience. This is a very technical book, whose main audience will be computer scientists interested in natural language, but the main ideas and general outline of the statistical approach to natural language come across to a wider audience. The focus of the book is natural language-words, collocations, phrases, sentences, and texts-though the goals and technical bases of the research are radically different from the ones researchers in linguistics use to construct and test accounts of natural language structures. The aspect of the book that I am best able to judge is the presentation of linguistic concepts and of the properties of natural language. They succeed in presenting a clear and accurate exposition of the concepts relevant to these topics, with clear examples in the form of diagrams or tables, and with relevant exercises for application of the concepts and procedures described. By necessity, they must explain and link theoretical concepts from different fields of knowledge, including probability theory, statistics, computer science, and linguistics. The authors have produced a large and comprehensive textbook about computer-based research on natural language using statistical measures.