NSF researcher Anthony Kroch of the University of Pennsylvania is trying to understand how language change spreads through populations. With collaborator Beatrice Santorini, he is compiling an electronic collection of Modern English texts covering the time period from 1700 to 1914 (the beginning of World War I). The completed “corpus,” as it is known, will complement three others created independently over the past decade by researchers from the University of Pennsylvania and the University of York, England. The existing works—which span 900 years of English history—contain more than 4.5 million words of text carefully tagged and annotated for linguistic features. The publicly available collection gives researchers a standardized, searchable document to track changes in the English language over time. It helps them explore language shifts in a historical context and examine the link between language learning and change. Mathematicians are using the data to create computer models that explain how changes diffuse through populations. Kroch is currently working with researchers in Canada and Brazil to create standardized, historical corpora of French and Portuguese.


During the Great Vowel Shift (GVS) of the 15 th century, English speakers changed the way they pronounced certain vowels. As with many large-scale language changes, the GVS occurred gradually, over about a century. These shifts in vowel pronunciation mark the biggest differences between Middle English and contemporary English. The sound clips above trace vowel pronunciation from Middle to present-day English.

Credit: Melinda J. Menzer, Furman University

Linguist Donald Ringe of the University of Pennsylvania and computer scientist Tandy Warnow of the University of Texas at Austin teamed up in 1993 to build statistical models that help explain how languages evolve. In some ways, language change parallels biological evolution. Many of today’s languages descended from others—for example, Latin is the “ancestor” of French, Spanish and Italian. And languages, like species, diverge over time after communities separate. Biologists and linguists both use trees as models to represent evolutionary origins and branch points. But languages don’t always change in a tree-like manner. Ringe and Warnow are developing new algorithms and computational methods for tracing the history of languages in a more precise way. Today, their research group includes statistician Steve Evans, from the University of California at Berkeley, and computer scientist Luay Nakhleh from Rice University. The team’s software and tools will help researchers from a variety of fields, including anthropologists studying human migration patterns.

By Nicole Mahoney

Language and Linguistics A Special Report