Probability Theory: The Logic of Science. E. T. Jaynes. Edited by G. Larry Bretthorst. xxx + 727 pp. Cambridge University Press, 2003. $60.
Theoretical physicist Edwin T. Jaynes, who died in 1998, is best known as pioneer and champion of the principle of maximum entropy, which states that of all possible probability distributions that agree with what you know about a problem, the one that leaves you with the most uncertainty is best—precisely because it does not imply more than you know. As important as the principle is in practice, surprisingly little space is devoted to it in Jaynes's magnum opus, Probability Theory: The Logic of Science.
What, then, is the book about? In Jaynes's words, "Probability theory as extended logic"—he is engaged in demonstrating that the range of applications for logical principles when dealing with an incompletely specified situation is far greater than is usually supposed. From this angle, maximum entropy is merely a corollary of a more fundamental principle, namely honesty in inference (which could well have been the book's subtitle). The nearly complete manuscript was put online years ago, before Jaynes died, where it achieved cult status and became widely studied. It is now finally available in book form, having been edited for publication by Jaynes's colleague and former student G. Larry Bretthorst.
Understanding this topic requires some historical background. Jaynes notes that such early developers of probability theory as James Bernoulli in the late 17th century and Pierre-Simon Laplace in the early 19th century saw it as an extension of logic to cases in which a lack of information makes deductive reasoning by Aristotelean syllogism impossible. Laplace called probability theory "the calculus of inductive reasoning." But starting in the middle of the 19th century, Laplace's interpretation and methods came under attack. The list of critics included Leslie Ellis, John Venn, George Boole, R. A. Fisher and others, and their "frequentist" views came to dominate the field in the 20th century.
Frequentists typically define the probability of an event as the limit of the frequency of the event as the sample size tends to infinity (think of the frequency of heads in an indefinitely long sequence of coin tosses). Thus they postulate the existence, behind the long-term averages empirically emerging from repeated random trials, of formal mathematical objects (limits, as in ordinary calculus) interpreted as intrinsic attributes of the system under study. Harold Jeffreys (to whom this book is dedicated) and Arnold Zellner were among the few who didn't subscribe to the frequentist orthodoxy. Jaynes builds on their work as well as that of Laplace and Thomas Bayes.
In Jaynes's Bayesian revival of Laplace's view, probability theory asks what degree of belief in an uncertain proposition is logically necessary, given all and only the information one has (which may include but is not limited to that obtained from random experiments). A probability is thus a measure of a state of knowledge and may change as this state is updated even when the system under study remains itself unchanged.
As a start to understanding Jaynes's approach, let us consider the process of drawing inferences. In introductory philosophy courses it is not unusual to see induction naively presented as "the opposite of" deduction. Deduction always draws assured consequences from the premises but will never tell us anything that was not already in the premises; specifically, if the premises represent what we know, deduction cannot tell us anything we didn't know to begin with. Induction, on the other hand, works backward from empirical facts to general principles. In this way it can tell us something new, but it will never guarantee its revelations.
As a matter of fact, pace David Hume and Karl Popper, no antithesis exists between deduction and induction. There is a single inference machine, whose kernel is ordinary logic. We feed the machine with a collection of statements (logic does not care in what order they are given or whether we call them hypotheses, empirical facts, axioms or priors) and we ask a yes-or-no question—call it A. Then we turn the crank, and after some grinding the machine will reply "Yes," "No" or "I don't know." (If you worry about undecidable problems, add a countdown timer, and when the time is up the machine will answer, "I don't know yet.") That's all there is to it!
A single shot at this machine is of limited value. Its glory is revealed, however, when we make it part of a feedback loop. Suppose we get "I don't know" for an answer. We can then further qualify the input statements—the premises—until we get a definite "Yes" or "No" answer. Proceeding in this way with different sets of premises—different scenarios, as it were—we can see which ones yield "Yes" and which "No."
And where do we get probability, you may ask. Briefly (and here I must skip a lot of fine print), once we have many scenarios (think of all possible hands in a card game or all possible molecular configurations of a liter of gas), we may have to give up trying to handle them one by one and resign ourselves instead to dealing with them "in bulk," bagging them according to some criteria, labeling each bag and keeping track only of how many scenarios we put in it. (Absolute counts don't matter much here—only relative ones do. For instance, if we add to our parameters a binary variable irrelevant to our inquiry, each scenario will split into two and thus the size of every bag will double, but the overall fraction of scenarios represented by each bag will not.)
The probability of our original question A being true is the fraction of scenarios for which the machine answers "Yes." This probability is of course conditioned by the prior—that is, the premises X that remained fixed while we were running through the scenarios (one of these premises being the set of admissible scenarios itself)—and is accordingly denoted p(A|X). If this probability distribution p(A|X) doesn't seem to match reality, all we can do is revise our prior X. Thus, as Jaynes points out (following Jeffreys), "induction is most valuable to a scientist just when it turns out to be wrong; only then do we get new fundamental knowledge."
The probability thus defined obeys by construction those quantitative rules that, as Jaynes argues (following Richard T. Cox and George Pólya), are the basic desiderata of plausible reasoning, namely the product rule and the sum rule. The product rule is used when the occurrence of one event is independent of the occurrence of other events and gives the probability that both A and B are true:
p(AB|X) = p(A|X)p(B|AX) = p(B|X)p(A|BX)
When the occurrence of one event precludes the occurrence of other events (when A and B are mutually exclusive), the sum rule,
p(A+B|X) =
p(A|X) + p(B|X) – p(AB|X),
can be used to add the individual probabilities.
Of course, different priors may lead to different probability distributions. However, the great majority of variables in the whole world may be presumed to be roughly independent of the specific phenomena under investigation. To paraphrase Emil Borel, the microscopic scenarios that make up a bag marked, say, "coin turned up heads" are all likely to be replaced by different microscopic scenarios when the prior changes from "full moon" to "new moon," yet the numerical contents of each bag will hardly be affected—and thus the typical gambler in his den will never notice the difference.
This observation is conceivably at the root of the mind projection fallacy most often attacked by Jaynes—namely, the notion that probabilities (such as of a coin's turning up heads) represent intrinsic properties of physics rather than a description of one's knowledge (or lack of knowledge) of the situation. Jaynes invites us to consider the statement "When I toss a coin, the probability for heads is one-half," pointing out that there are two ways of interpreting it:
(A) "The available information gives me no reason to expect heads rather than tails, or vice versa—I am completely unable to predict which it will be." And
(B) "If I toss the coin a very large number of times, in the long run . . . the frequency of heads will approach 1/2." Statement (A) describes only a state of knowledge, whereas in (B) the number 1/2 appears to be a property of the coin itself.
"The idea that probabilities are physically real things based ultimately on observed frequencies of random variables," Jaynes writes (referring to the frequentist view), "underlies most recent expositions of probability theory, which would seem to make it a branch of experimental science." To give examples of "physical considerations that show the fundamental difficulty with the notion of a 'random' experiment," Jaynes demonstrates how to cheat at coin and die tossing, discusses the probability of bridge hands and shows that frequentists will dump their cherished notion of probability as the "limit of an infinite sequence of identical random experiments" as soon as they are confronted with a questionable shuffler.
Are probabilities, then, just in our mind? "Our probabilities and the entropies based on them are indeed 'subjective' in the sense that they represent human information," Jaynes noted in his 1990 article "Probabilities in quantum theory" (in Complexity, Entropy, and the Physics of Information, edited by Wojciech H. Zurek [Addison-Wesley]).
If they did not, they could not serve their purpose. But they are completely "objective" in the sense that they are determined by the information specified [i.e., the prior X, via the inference machine], independent of anyone’s personality, opinions, or hopes.
To wean us from the subjective/objective dilemma, Jaynes encourages us to come up with inference rules that could be implemented by an imaginary robot. These rules are to be deduced from the following desirable qualities: (1) Degrees of plausibility are represented by real numbers. (2) Qualitatively, the robot's reasoning corresponds with common sense. (3) It always reasons consistently; specifically, it always takes into account all the relevant evidence, and if the robot has the same state of knowledge in two problems, then it assigns the same plausibilities in both. From these Jaynes derives all of the Bayesian machinery for statistical inference.
Jaynes has been accused of exaggerating the differences between the frequentist school and his own. But now that it no longer takes an act of courage to profess the Bayesian faith (younger students of statistics don't even realize that there ever was an issue), it's hard to imagine how different the world of statistical inference might have been, especially in image processing, data mining and bioengineering, without Jaynes's 30 years of preaching and rallying the flock. The raison d'être for the polemical aspects of Jaynes's gospel has been extinguished to some extent, and the conceptual and technical aspects, although they remain valid, may no longer be as revolutionary as they sound, simply because the revolution has been won.
But one facet of Jaynes's thinking that is particularly visible in this book and remains as fresh as ever is a relentless drive to demystify and democratize science. Science is especially successful when it turns difficult tasks into trivial ones. (How else could we be made free to tackle today's new difficult tasks?) But technical people who used to thrive on those very difficulties are understandably loath to lose their competitive advantage. Abstruseness, lack of universal rules ("ad-hoc-ness"), obfuscation and intimation of magic or dogma betray an attempt (unconscious, perhaps) to retain control. We can sense the indignation that Jaynes, who began life as a poor orphan from Iowa, must have felt toward those pooh-bahs (such as Sir Ronald Aylmer Fisher, Cambridge alumnus, Fellow of the Royal Society, Royal Medalist) who, instead of using their high position to help streamline scientific thought, kept adding clutter to it (and, to add insult to injury, had the gall to knowingly speak of Nature's "propensities" without being physicists themselves).
In this book as well as in his other writings, Jaynes displays an intensity of missionary zeal worthy of Saint Paul. What inspired it? Jaynes thought that "spectacular advances in the technology of experimentation, with increasingly detailed control over the initial state of individual atoms" were going to bring about a Bayesian revolution in quantum theory. "A century from now the true cause of microphenomena will be known to every schoolboy," he asserted. In the aforementioned 1990 article, Jaynes made this revelation:
I had intended originally to specialize in Quantum Electrodynamics, but this proved impossible. Whenever I look at any quantum-mechanical calculations, the basic craziness of what we are doing rises in my gorge and I have to find some different way of looking at the problem that makes physical sense. Gradually, I came to see that the foundation of probability theory and the role of human information have to be brought in, and I have spent many years trying to understand them in the greatest generality.
Was Jaynes's life's work then just a long detour on an unfulfilled quest for the Holy Grail of a "rational explanation" of quantum mechanics? Regardless, thanks to the boy from Iowa, today every schoolchild, and every scholar, can approach inference unhampered by absurd probability myths.
Most thick technical books are purchased with hope and then hardly ever touched. This book should be spared that fate: It is a pleasure to read. The subject index is rather skimpy; thus, often the best way to locate an item is to read a whole chapter through, whereby one encounters much that is rewarding. The bibliography is rich and well annotated. There are many exercises, and occasionally the editor has creatively turned a gap in the manuscript into an exercise for the reader.
I may disagree with Jaynes here and there, but like the principle of maximum entropy, he keeps us honest and makes us see much.