Skip to main content


Proto-Indo-European language

  • Proto-Indo-European (PIE) is the ultimate ancestor of many European and Indian languages.
  • Recognizing the relatedness of words in different languages can give you a boost in memorizing them.
  • My Indo-European Cognate Dictionary gathers words from 32 languages into cognate clusters that show related words.

The Proto-Indo-European language (PIE) is the ultimate ancestor of many European and Indian languages. The word "proto" indicates it was spoken thousands of years in the past and we have no direct record of it. What we do have is the clear evidence in its descendant languages, from the consistent patterns in the way their words vary, that there was such an ancestor. Following these patterns, scholars have deduced a quite extensive vocabulary — but they are still reconstructed, not ‘real’ words. We can never know exactly how these words were pronounced, or precisely how they were used. Conventionally, therefore, such words are written with a preceding asterisk.

Here is a list of the living branches of the Indo-European language tree (the languages covered in my Indo-European Cognate Dictionary are given in bold print):

Celtic: Breton, Welsh, Irish, Scottish Gaelic (in order of number of speakers)


West Germanic: English, Old English, Frisian, Dutch, German

North Germanic: Swedish, Danish, Norwegian, Icelandic, Norse (in order of number of speakers)

Italic: Latin and its descendants: French, Italian, Spanish, Portuguese, Romanian

Greek (Ancient, Modern)


Baltic: Lithuanian, Latvian


Western: Polish, Czech, Slovak, Sorbian / Lusatian

Southern: Serbo-Croatian, Macedonian, Slovene, Bulgarian

Eastern: Russian, Ukrainian, Byelorussian



Iranian: Persian, Tajik, Pashto, Baluchi, Kurdish, Ossete

Indic / Indo-Aryan: Sanskrit, Hindi, Punjabi, Gujarati, Marathi, Bengali, Nepali, Sinhala, Urdu, Romani

Linguistic laws

Here are some of the patterns that scholars have observed. These have become complicated over the years as linguists explain variations, but the initial discovery was very simple and easy to describe. So, bearing in mind that these ‘rules’ don’t apply all the time, and there are a number of principles that describe variations to these rules, and other patterns, here are the main linguistic patterns relating to Germanic languages. These were first realized by Jacob Grimm (yes, one of the Grimm brothers, of fairy tale fame) in 1822. Grimm spotted that a p at the beginning of a word in Sanskrit, Latin, or Greek, consistently becomes f in Germanic languages. He went on to observe nine such patterns, which collectively are known as Grimm’s law:

p → fbook cover

d → t

k → h

t → th

b → p

g → k

bh → b

dh → d

gh → g

Here are examples of these in action:

Latin pater is English father  (p → f; t → th)

French pied is English foot (p → f; d → t)

Latin caput is Old English hafud (k → h; p → f), meaning head

Latin tres is English three (t → th)

Lithuanian dubus (from PIE *dʰewb-) is English deep (b → p)

Latin genus (race) and English kin are cognate (g → k), as is Latin ager (field) and English acre

PIE *bʰeh2go- becomes beech in English, but that same bh sound becomes f in Latin and Greek, hence fāgus and  φηγός ‎(phēgós)

similarly, PIE *bʰréh2tēr is brother in English, but frater in Latin and φράτηρ ‎(phrátēr) in Greek

PIE *dʰugh2tḗr becomes daughter; PIE *dʰwer- becomes door

PIE *ǵʰer- becomes garden and garth in English, but hortus and χόρτος ‎(khórtos) in Latin and Greek

It’s also worth noting that there’s a fundamental distinction between the ‘Western’ and ‘Eastern’ branches of the Indo-European tree, that’s expressed as the kentum-satem divide. This reflects the fact that most of the Western languages have a word for hundred that begins with a hard k sound, like Latin centum (in the Germanic languages, as Grimm’s law describes, this k becomes h, hence our hundred). In the Eastern languages, the word for hundred begins with a soft s sound, as in the Sanskrit word satem. This distinction between a hard k and a soft s sound is thought to reflect a very early split in the Proto-Indo-European tribes, as some headed west and others east. Note how that Western-Eastern divide plays out in the branches:

Western (kentum): Celtic, Germanic, Italic, Greek, Anatolian

Eastern (satem): Balto-Slavic, Armenian, Albanian, Indo-Iranian

Having talked about changes, I should note that, notwithstanding the thousands of years that have passed, the occasional PIE word has been retained almost unchanged to the present day (and personally, I find that quite exciting to see!). Daughter (*dʰugh2tḗr) and *new (new) are excellent examples of this.

Relevance to language learning

It's said that you need around 2000 words to be usefully fluent in a language (this is a ballpark figure, and obviously depends on the language and what you need it for). Learning this number of words is the reason why most people fail at learning another language. But here's the thing — people often think it's all about memorization, and it isn't. Even though vocab learning would seem to be a prime example of information that just needs to be hammered into your brain using brute force, rather than building understanding, there is in fact a role for understanding, for natural connections with information you already know well.

How easy a language is to learn is partly down to the number of shared cognates — Spanish, for example, is one of the easiest languages for English speakers to learn, because there's a huge number of words that are very very similar. But not all cognates are obvious to the untutored eye. Some need a bit more knowledge before they become clear. If you can learn to see those natural connections, you won't need to apply more difficult strategies.

In my cognate dictionary, I have collected 40,000 words from 32 Indo-European languages into 430 cognate clusters, in order to help you increase the number of words you can learn through natural connections.

Indo-European Cognate Dictionary