Logo IMG


Gaul in the Family

Greg Ross

Ever since Darwin proposed an evolutionary tree to describe the descent of species, linguists have sought to apply the concept in their own field. But languages can behave in quirky ways, lending to one another, changing in parallel, sometimes even converging. Now historical linguists may stand to benefit by borrowing a second idea from evolutionary biology.

Peter Forster, a geneticist at the University of Cambridge, was attracted by the riddle of Celtic, which developed on the continent as Gaulish in ancient France and northern Italy. It's known that the language jumped to the British Isles, where it evolved into Scots Gaelic, Irish, Welsh and the Breton language of northern France. But did that historic jump occur in one wave or two? And when did it happen?

These are difficult questions to answer, because the Gaulish language and its records were largely eradicated during the Roman conquest. Historical linguists commonly study how words in a language have changed over time, and the paucity of data left open many questions about the history of Celtic's modern descendants on the British Isles.

Forster and his colleague Alfred Toth of the University of Zurich realized that the problem might yield to network analysis, a technique used to trace the evolutionary relationships among genes. By examining the linguistic meanings and functions in Gaulish and comparing them systematically with their known counterparts in other languages, they could infer an evolutionary history for the whole family. Their results, published in the Proceedings of the National Academy of Sciences of the United States of America, provide some intriguing insights into the spread of language in Western Europe, and offer a new tool to linguistic historians.

Roman bathsClick to Enlarge Image

This is not the first time that linguists have borrowed techniques from other disciplines. If a document has been copied repeatedly—an ancient manuscript, say, or a chain letter—researchers can trace its evolution by identifying "mutations" between generations. Last year, researchers at Rome's La Sapienza University compiled 52 translations of the Universal Declaration of Human Rights and computed their relative entropy to produce a fairly accurate family tree of Eurasian languages. Forster's phylogenetic technique was valuable because it could reflect "untreelike" reticulations—and could permit time estimates as well.

In order to minimize any bias arising from their own familiarity with the various languages, Forster and Toth began with Gaulish, for which a significant number of bilingual inscriptions have fortunately survived. These could provide valuable contemporary translations, forming a sort of Rosetta stone for understanding the origins of the Celtic languages spoken in Britain.

Starting with these inscriptions, the researchers compiled a list of 35 Indo-European items in 13 languages and applied the phylogenetic technique to analyze their characteristics systematically. Among the first things they noticed, for example, was that verbs precede subjects in the insular Celtic languages under study, but follow them in all the others. They amended the network to reflect this fact, and it grew in complexity as further observations were made.

Unearthed in a French field in 1897Click to Enlarge Image

The finished network displayed branches for English, Greek, Latin and the Romance languages, as expected, and it shed some interesting light on Gaulish and its descendants. Forster and Toth's results suggest that a common Celtic branch emerged early within Indo-European, and that Gaulish (continental Celtic) then divided from insular Celtic, which subsequently split up into Brythonic (Welsh and Breton) and Goidelic (Irish and Scots Gaelic). The "jump" to Britain had occurred in one wave, not two.

The researchers estimated dates for these events by comparing the accumulated differences between modern languages and their ancient ancestors, calibrating with known historical events. This produced an average rate of language change, in this case one lexeme "mutation" every 1,350 years. This would mean that ancestral Indo-European arose in 8100 b.c., plus or minus 1,900 years, considerably earlier than previous estimates of about 4000 b.c. It also suggests that Celtic arrived in Britain in 3200 b.c., plus or minus 1,500 years, before differentiating into the languages of the British Isles.

Forster cautions that future finds of bilingual texts could change the picture, but these results demonstrate the utility of his technique. Among other things, they lend support to the hypothesis that Indo-European languages were spread by early farmers, since agriculture is thought to have arrived in Europe around 6000 b.c. It's not yet clear, though, whether movements of language signify movements of people. "Did the Celtic languages spread by contact, or did the speakers themselves spread, or did both mechanisms happen?" Forster wonders. "This might be answered one day by genetics."—Greg Ross

comments powered by Disqus


Of Possible Interest

Computing Science: Belles lettres Meets Big Data

Computing Science: Uniquely Me!

Feature Article: Framing Political Messages with Grammar and Metaphor

Subscribe to American Scientist