Logo IMG
HOME > PAST ISSUE > July-August 2014 > Article Detail


Belles lettres Meets Big Data

Quantitative analysis of poetry and prose has roots deep in the 19th century.

Brian Hayes

Books to Not Read

There is something undeniably risible about the earnest savant, engrossed in the counting of letters, words, and sentences, while the ordinary reader finds a better use for books. Willa Cather was not alone in poking fun at this figure. Jonathan Swift, in Gulliver’s Travels, wrote of the professor who “made the strictest Computation of the general Proportion there is in the Book between the Numbers of Particles, Nouns, and Verbs, and other Parts of Speech.”

To some extent Sherman deserves his obscurity; there is much in Analytics of Literature that now seems muddle-headed or misguided. And yet his basic observations about changes in sentence structure might yet reward further investigation—most likely as a linguistic rather than a literary phenomenon.

Sherman did the hard work of counting and compiling data, but he wasn’t able to formulate much of an explanation of why and how the changes in syntax came about. His interpretation was vaguely evolutionary, framed in terms of the progressivist and teleological ideas that then ruled evolutionary thought. According to this view, written English prose had been steadily gaining in “fitness,” and reached its culmination in Sherman’s own time. The sentences of Elizabethan writers “are prevailingly either crabbed or heavy,” he wrote. “Ordinary modern prose, on the other hand, is clear, and almost as effective to the understanding as oral speech.” There’s no sign that Sherman gave any thought to how this evolutionary process might continue into the future. What would he make of the English sentence in the age of texting? LOL.

Perhaps the digital humanists of the 21st century will rediscover and extend Sherman’s work on sentence structure. They are already re-implementing his idea of the literature lab. I recently came upon the syllabus for an English course at Northeastern University where the class assignment begins: “Choose a big Victorian novel to not read.”


  • Funda, E. I. 2005. “With scalpel and microscope in hand”: The influence of Professor Lucius Sherman’s 19th-century literary pedagogy on Willa Cather’s developing aesthetic. Prospects 29:289–324.
  • Mendenhall, T. C. 1887. The characteristic curves of composition. Science 9:237–249.
  • Mendenhall, T. C. 1901. A mechanical solution of a literary problem. Popular Science Monthly 60:97–105.
  • Moretti, F. 2013. Distant Reading. London: Verso.
  • Mosteller, F., and D. L. Wallace. 1964. Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley.
  • Sherman, L. A. 1892. On certain facts and principles in the development of form in literature. University Studies of the University of Nebraska, Vol. 1, No. 4.
  • Sherman, L. A. 1893. Analytics of Literature: A Manual for the Objective Study of English Prose and Poetry. Boston: Ginn & Company.

comments powered by Disqus


Subscribe to American Scientist