Australasian Science: Australia's authority on science since 1938

Thinking the unthinkable: tracing language back 15,000 years

By Michael Dunn

Linguists have identified a set of 23 frequent words to establish relationships between languages dating back to ancient times.

Just about everyone has a personal stake in language, and many people — expert and amateur — feel entitled to an opinion. But linguists care more than most people, and when linguistics hit the media, linguists can get very agitated indeed.

Published earlier this month in the Proceedings of the National Academy of Sciences (PNAS), the latest paper to upset linguists around the world uses methods from computational evolutionary science to look at questions about language prehistory.

So why exactly are its conclusions so very challenging to traditional historical linguistics?

Language families

Standard historical linguistic methods let us reconstruct languages from the past based on so-called “cognates”.

Words from a pair of languages are cognate if they are similar in form and meaning, and can be shown to have descended from a common word present in the ancestor of those languages. Finding words that are similar in form and meaning is easy, but showing that these cognate candidates are true cognates is trickier.


The key here is that language change includes mutations to the sound systems of different languages which can affect all words in the lexicon.

True cognates will be part of a larger group of cognate sets which all show the effects of the same mutation in the sound system.

Historical linguists can build a chain of inferences about such sound changes extending back into prehistory.

Through this method, linguists can show with the highest degree of certainty that English five, French cinque, Russian pyat' and Armenian heng are all cognate, descendants of a single word in their common ancestor (a reconstructed language known as Proto-Indo-European).

Nina Matthews Photography

Likewise, linguists can show that the word dog in English and the word dog (meaning the same thing) in the Queensland language Mbabaram are chance resemblances produced by completely unrelated historical pathways.

Going deeper

While standard historical linguistic techniques are very powerful, they have natural limits. Beyond a certain time depth, so many sound changes have accumulated it’s no longer possible to identify cognates or prove that two languages belong to the same family.

But just because these methods no longer apply, must we assume it’s impossible to make any scientifically valid statements about language in the deeper past? The new PNAS paper, led by English evolutionary biologist Mark Pagel, sets out to challenge this assumption.

In previous work, members of Pagel’s team have shown statistically what many linguists intuitively believed all along: that more frequently used parts of the lexicon are more historically stable.


In quantifying the correlation between frequency of use and stability they were able to measure “lexical half-lives”, a measure of the stability of individual cognate sets within language families.

While most cognate sets stay around for a few hundred or a few thousand years, there is a hard core of terms that are stable over much longer periods.

The new paper starts from the point that if a cognate set is so stable that it is preserved for the 6,000 or 8,000 years of reconstructible history of a language family, chances are it was present in the ancestors of the language family for a very long time before that.

This, in turn, means that the languages of Eurasia could quite plausibly share terms which, deep down, are in fact cognate in the unreconstructible past.

Der Turmbau zu Babel (the Tower of Babel) by Meister der Weltenchronik. Wikimedia Commons

In a sophisticated (and, I admit, difficult to understand) analysis, the research team tested proposed cognates shared between the different language families of Eurasia.

The proposals themselves are necessarily controversial, since they are produced by a group committed to searching for long distance connections between languages. But the study shows that the larger proposed cognate sets and the ones showing more links between families are precisely the ones which they would predict are more stable.

They then use these cognate proposals to infer language history, weighting the proposals against their inferred half-lives of the words, show these languages can be grouped consistently into a super-family originating about 15,000 years before present.

A positive contribution

This is a new approach to deep time historical linguistic inference.

A lot more work is required to test both the validity of the methods, and of their specific results – but it is quite possible that the historical signal they detect is an artifact. The methodology appears sound, and has the potential to teach us a great deal.

It should not be dismissed out of hand just because it does not respect the limitations of traditional historical linguistics.

Michael Dunn is an evolutionary linguist working with computational phylogenetic methods at the Max Planck Institute for Psycholinguistics. He has coauthored with Quentin Atkinson, and has published papers using methods developed by Mark Pagel and Andrew Meade. This article was originally published at The Conversation.