BOOK REVIEW
Getting the Message Across
David Saad
Information Theory, Inference, and Learning Algorithms.
David J. C. MacKay. xii + 628 pp. Cambridge University Press, 2003. $50.
Students taking classes in probability and statistics often come
away thinking that those subjects are relevant only for determining
whether dice are loaded or calculating the odds that a particular
sequence of colored balls will be drawn from an urn. In reality, of
course, probability and statistics have a wide range of practical
applications, some of which are reviewed in David MacKay's
Information Theory, Inference, and Learning Algorithms.
The thread MacKay follows throughout the book is the use of
principled mathematical approaches—mainly probability and
Bayesian statistics—for handling, communicating and extracting information.
Much has been said about the importance of interdisciplinary
research in modern science as a vehicle for cross-fertilization
between fields and for facilitating breakthroughs when existing
methods fail. In reality the term is often misused. MacKay's book is
genuinely interdisciplinary: It traces the applicability of concepts
and methodologies across disciplines, with an emphasis on his own
main research interests—information theory, Bayesian
statistics and machine learning.
The book has six main parts covering various aspects of information
theory and inference; each part consists of several "bite
size" chapters, which are easy to read and digest. MacKay
offers a good overview of the main concepts, emphasizing insight and
intuitive understanding; rigorous (or semirigorous) proofs follow
later to satisfy the demanding reader. The presentation itself is
very informal, with an occasional "pause for thought" in
the form of a statement or an interesting question.
A broad range of exercises appear at the end of each chapter, in
some cases along with—quite exceptionally—the worked-out
solutions. Both the exercises and solutions should be invaluable to
students for understanding the material. The book is highly readable
and accessible enough to be studied independently.
Sections on information theory, probability and inference methods
are interleaved in a manner that may not be to everyone's liking but
is consistent and logical. In the preface, MacKay sketches the
interdependencies between the various chapters and offers four
recipes for using the book as the main textbook in courses on
information theory or on Bayesian inference and machine learning.
In the sections on information theory, which make up about half the
book, MacKay manages to get the main points across with fewer formal
definitions and complicated derivations than one typically finds in
more traditional textbooks. The main tools and definitions are
provided alongside review material from basic probability and statistics.
The material covered is essential to the understanding of electronic
communication, from mobile and satellite communication to digital TV
and the Internet. Most modern digital communication systems require
efficient methods of source and channel coding. Source coding refers
to the compression of redundant information (in pictures and music,
for example), even at the expense of fidelity, whereas channel
coding refers to the introduction of some controlled redundancy
prior to transmission in order to protect the information against
corruption in a noisy transmission medium (for example, deep space,
atmosphere or optical fibers).
MacKay covers basic material concerning information theory and
coding as well as advanced techniques, such as low-density
parity-check codes (in whose development he played a significant
part), and turbo and fountain codes. These new error-correcting
codes facilitate the transmission of information at close-to-optimal
rates. This more advanced material, which is presented toward the
end of the book, provides insight into what are arguably the main
developments in information theory over the past decade.
The other main area covered by the book is inference and its
relation to learning algorithms. Humans have been trying to delegate
more and more tasks to machines since prehistoric times. In recent
decades, research has focused on modeling knowledge in an attempt to
make it possible to use computers for handling sophisticated tasks
(such as the classification of handwritten digits, voice recognition
and forecasts of financial indicators). The problem of defining the
model and estimating its parameters has been expressed in a clear
mathematical form, using probability theory and Bayesian statistics,
and new and more efficient methods have been devised to solve it.
For instance, given a model we aim to infer its parameters from
data, represented by observations or measurements.
MacKay's presentation of probability, Bayesian statistics and
inference methods is excellent. Following tradition, he discusses
the distribution of balls in urns, alongside more interesting and
relevant examples of inference from medical and crime-scene data.
The use of Bayesian statistics—for inference in general as
well as in specific models such as Gaussian processes and neural
networks—follows naturally. Also covered within the same
framework are several related problems, such as model selection and
evaluation and the automatic determination of relevance of variables.
One of the main difficulties in representing problems
probabilistically is that a significant computational effort is
required to obtain exact results. The solution comes from an
unexpected source: the casinos of Monte Carlo. In Monte Carlo
sampling, some of the more demanding calculations are replaced by
sampling techniques based on randomly drawing numbers from a given
probability distribution so that the statistical properties of the
sampled values provide a good approximation to the required result.
The book covers a range of approximations and sampling techniques,
including such recent approaches as "exact sampling" and
"slice sampling." MacKay provides numerical examples to
demonstrate the efficacy of the methods.
Several other areas are briefly explained in isolated chapters, such
as crosswords and codebreaking, information acquisition and
evolution, and various aspects of statistical physics. The insight
and understanding provided by these chapters is limited by the fact
that they are only loosely linked to the main themes of the book.
This is primarily an excellent textbook in the areas of information
theory, Bayesian inference and learning algorithms. Undergraduate
and postgraduate students will find it extremely useful for gaining
insight into these topics; however, the book also serves as a
valuable reference for researchers in these areas. Both sets of
readers should find the book enjoyable and highly
useful.—David Saad, Neural Computing Research Group,
School of Engineering and Applied Science, Aston University,
United Kingdom