Logo IMG
HOME > PAST ISSUE > Article Detail


First Links in the Markov Chain

Probability and poetry were unlikely partners in the creation of a computational tool

Brian Hayes

The Drivel Generator

2013-03HayesFD.jpgClick to Enlarge ImageFor Markov, extending the law of large numbers to interdependent samples was the main point of his inquiry. He bequeathed us a proof that a Markov chain must eventually settle down to some definite, stable configuration corresponding to the long-term average behavior of the system.

In the 100 years since 1913, Markov chains have become a major mathematical industry, but the emphasis has shifted away from the questions that most interested Markov himself. In a practical computational setting, it’s not enough to know that a system will eventually converge to a stable value; one needs to know how long it will take. With the recent vogue for huge Markov systems, even estimating the convergence time is impractical; the best that can be expected is an estimate of the error introduced by prematurely terminating a simulation process.

I conclude this essay with a more personal story about my own introduction to Markov chains. In 1983 I wrote a “Computer Recreations” column for Scientific American subtitled “A progress report on the fine art of turning literature into drivel.” I was exploring algorithms that exploit the statistical structure of language to generate random text in the manner of a particular author. (Some specimens based on Eugene Onegin appear in the illustration above.)

One version of the drivel algorithm builds a transition matrix whose rows are labeled by sequences of k letters and whose columns define the probabilities of various letters that can follow each k-character sequence. Given an initial k-letter seed sequence, the program uses the matrix and a random number generator to choose the next character of the synthesized text. Then the leftmost letter of the seed is dropped, the newly chosen character is appended to the right side, and the whole procedure repeats. For values of k larger than 2 or 3 the matrix becomes impractically large, but there are tricks for solving this problem (one of which eliminates the matrix altogether).

Shortly after my article appeared, I met Sergei Kapitsa, the son of Nobel laureate Pyotr Kapitsa and the editor of the Russian-language edition of Scientific American. Kapitsa told me that my algorithms for generating random text all derived from the work of A. A. Markov, decades earlier. I expressed a certain skepticism: Maybe Markov invented the underlying mathematics, but did he apply those ideas to linguistic processes? Then Kapitsa told me about Markov’s Onegin paper.

In a later issue of the magazine I published a contrite addendum about Markov. I had to write it without ever having read a word of Markov’s work, and I went overboard a little, saying that Markov “asks to what extent Pushkin’s poem remains Pushkin’s when the letters are scrambled.” Thirty years later, I hope this column will restore the balance. Sadly, though, I am too late to share it with Kapitsa. He died last summer at age 84.


  • Basharin, G. P., A. N. Langville and V. A. Naumov. 2004. The life and work of A. A. Markov. Linear Algebra and its Applications 386:3–26.
  • Diaconis, P. 2009. The Markov chain Monte Carlo revolution. Bulletin of the American Mathematical Society 46:179–205.
  • Kemeny, J. G., J. L. Snell and A. W. Knapp. 1976. Denumerable Markov Chains. New York: Springer-Verlag.
  • Link, D. 2006. Chains to the West: Markov’s theory of connected events and its transmission to Western Europe. Science in Context 19(4):561–589.
  • Link, D. 2006. Traces of the mouth: Andrei Andreyevich Markov’s mathematization of writing. History of Science 44(145):321–348.
  • Markov, A. A. 1913. An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. (In Russian.) Bulletin of the Imperial Academy of Sciences of St. Petersburg 7(3):153–162. Unpublished English translation by Morris Halle, 1955. English translation by Alexander Y. Nitussov, Lioudmila Voropai, Gloria Custance and David Link, 2006. Science in Context 19(4):591–600.
  • Ondar, Kh. O., ed. 1981. The Correspondence Between A. A. Markov and A. A. Chuprov on the Theory of Probability and Mathematical Statistics. New York: Springer-Verlag.
  • Seneta, E. 1996. Markov and the birth of chain dependence theory. International Statistical Review 64:255–263.
  • Seneta, E. 2003. Statistical regularity and free will: L. A. J. Quetelet and P. A. Nekrasov. International Statistical Review 71:319–334.
  • Shannon, C. E. 1948. A mathematical theory of communication. Bell System Technical Journal 27:379–423, 623–656.
  • Sheynin, O. B. 1989. A. A. Markov’s work on probability. Archive for History of Exact Sciences 39(4):337–377.
  • Vucinich, A. 1960. Mathematics in Russian culture. Journal of the History of Ideas 21(2):161–179.

comments powered by Disqus


Subscribe to American Scientist