Anatomy of Estonian declension, part I
2023-01-28 18:48:00 +0000 | Reading time:
17 mins
|
What is “declension”? As per Wikipedia:
In linguistics, declension (verb: to decline) is the changing of the form of a word, generally to express its syntactic function in the sentence, by way of some inflection. Declensions may apply to nouns, pronouns, adjectives, adverbs, and articles to indicate number (e.g. singular, dual, plural), case (e.g. nominative case, accusative case, genitive case, dative case), gender (e.g. masculine, neuter, feminine), and a number of other grammatical categories.
Estonian language is exactly one of the languages, where declension is quite important. Maybe not so much as in Slavic languages, but way more important, compared to English or Dutch. Estonian names (primarily nouns and adjectives) count 14 (by some definition - 15) cases, but most of those are simple agglutinations, i.e. appending a certain constant suffix to some basic form. However, some cases are not so simple, and getting used to them takes quite some time. In this blog post, we’ll try to identify some of the ways Estonian declension works.
Let’s go! In fact, I have a database of Estonian lexemes (described in another blog post), and we’ll be using it for our research. I will not be using the Wikidata this time, however it must be fully possible to reproduce with Wikidata Lexemes as well.
Let’s begin with formally identifying the “real” cases, i.e. the ones where forms are NOT (or not always) produced by simply appending a suffix to basic form. In order to that, we’ll find the lexemes (words) where inflected form + case syntax is different from the actually observed form.
Let’s begin with the usual suspects: “nina taga” group of cases (i.e. forms, ending in -ni
, -na
, -ta
and -ga
):
with base as (
select r.representation, ef.paradigm_id as id
from ekilex_forms ef
join representations r on ef.word_representation_id = r.id
where ef.form_type_combination_id = 2 and r.representation <> '-'
),
declined as (
select r.representation, ef.paradigm_id as id
from ekilex_forms ef
join representations r on ef.word_representation_id = r.id
where ef.form_type_combination_id = 14 and r.representation <> '-'
)
select base.id, base.representation, declined.representation as inflected, concat(base.representation, 'ga') as suffixed
from base
join declined on base.id = declined.id
where declined.representation <> concat(base.representation, 'ga');
this produces ~70 results, which is basically nothing. In all cases, these can be explained by multiple observed basic forms
(basic form in Estonian is omastav
, i.e. genitive case). So, obviously, where there is more than one omastav, we’ll see “discrepancies”.
If we throw away all the words with more than one singular genitive, we’ll get just these 4 results:
Word | Base form | Translation | Actual -ga form |
Formula-based form |
---|---|---|---|---|
keegi | kellegi | someone | kellegagi | kellegiga |
kumbki | kummagi | one of both | kummagagi | kummagiga |
miski | millegi | something / nothing | millegagi | millegiga |
ükski | ühegi | no one/ anyone | ühegagi | ühegiga |
And here we see very interesting phenomenon, remnants of a system, that is still very much present in Finnish:
relative positioning of suffixes. -gi/-ki
is so-called emphatic suffix, i.e. you add it to an end of the word when
you want to put special emphasis, highlight this word in a phrase. And, as we can see, it has to be placed in the last
position, after the case suffix -ga
. By the way, words with -gi/-ki
normally never make it in the dictionary, because,
it being emphatic, it can be easily applied to any worm of any word. But these particular words, initially being just
emphatic forms of other words, ended up being very important and deserving own place in the vocabulary.
Now, when we know of these exceptions, it’s safe to exclude them as well and exclude the -ga
case
(which is called comitativ, and means “with, or including, this word”),
from the consideration as the real inflectional case.
Let’s now look at -ta
case (abessive, which means “without this word”). Same thing! In all of the cases
(except those we’ve already excluded), the observed (as provided by our reference dictionary.
Same results for -na
(essive) and -ni
(terminative) cases.
We get some interesting results with translative -ks
case:
Word | Base form | Translation | Actual case form | Formula-based form |
---|---|---|---|---|
see | selle | this | seks | selleks |
seesama | sellesama | this same | sekssamaks | sellesamaks |
seesama | sellesama | this same | sellekssamaks | sellesamaks |
These observed discrepancies are in fact again consequences of duplication, but in this case, not of the base form but of the “cased” form.
The word see
has 2 forms for singular translative: seks/selleks
,
so we get 1 result where dictionary form is different from the formula of “base form + -ks”. It’s even more “interesting”
with seesama
, which according to dictionary has 2 translative case forms:
sellekssamaks
and sekssamaks
, so this word can be considered one and only true deviation from the case formula
for the translative case in the whole of Estonian language (in singular only, there’s entirely different story with plural).
Now, let’s look at so-called locative cases, that are used (for the most part, but not only) to reflect positioning (placement)
of object or direction of action. Let’s begin with ablative case that has -lt
case suffix.
Word | Base form | Translation | Actual case form | Formula-based form |
---|---|---|---|---|
see | selle | this | selt | sellelt |
seesama | sellesama | this same | selleltsamalt | sellesamalt |
seesama | sellesama | this same | seltsamalt | sellesamalt |
too | tolle | that | tolt | tollelt |
Again, usual suspects, which we should probably exclude from further analysis. These words and these forms, apparently,
occur so often in everyday speech, that they have developed multiple realizations, selt/sellelt
, and, by analogy,
tolt/tollelt
, all of which have gained enough traction to be included in the dictionary.
Next one: adessive case, with an -l
suffix. Having excluded words from
above tables, we see these new forms popping up in naughty forms list:
Word | Base form | Translation | Actual case form | Formula-based form |
---|---|---|---|---|
kes | kelle | who | kel | kellel |
mis | mille | what | mil | millel |
Same story as above, but also there’s some lesson lurking there: we observe that a form with -llel-
is very frequently
reduced to just -l
: millel/mil
, sellelt/selt
, kellel/kel
, tollelt/tolt
, and so on. This is probably our first
hint on existing of two parallel paradigms (spoiler: the longer one, and the shorter one) for most of the personal pronouns, on which we probably will discover more
going forward.
Next one: allative case with -le
ending.
Word | Base form | Translation | Actual case form | Formula-based form |
---|---|---|---|---|
ma | mu | I | mulle | mule |
sa | su | you (informal) | sulle | sule |
ta | ta | he/she | talle | tale |
These are again, three instances of shortened paradigm for personal pronouns, and we can observe that -le
in these
words is applied as -lle
, and I thus far have no idea as of “why”. I guess it just sounds more natural this way, to a native speaker!
With this, we have covered so-called “exterior” group of locative cases (the ones that might be translated using “on” or “onto”).
Next, we’ll cover “interior” ones (that might be translated with “in” or “into”).
First, elative case, with the -st
formula. And here, we observe one exception,
that occurs in many words, but all of these words are just compounds with kodu
as basic component:
Word | Base form | Translation | Actual case form | Formula-based form |
---|---|---|---|---|
isakodu | isakodu | father’s home | isakodunt | isakodust |
kodu | kodu | home | kodunt | kodust |
koolkodu | koolkodu | school house | koolkodunt | koolkodust |
So, one more exception in our collection! Not sure how it is explained though
Next, inessive, with -s
formula. No exception! Phew…
And, final locative case: illative, with -sse
formula.
No exception again! Wait a minute, this is… fishy. I know for a fact that in this case there’s tons of words that do not comply with a formula.
Let’s check out some paradigm, for example that very same kodu
, on Sõnaveeb.
Aha! Here, we see what has happened: Sonaveeb (which a website, maintained by Eesti Keele Instituut, i.e.
Institute of Estonian Language), simply considers all such “exceptions” to be instances of a separate case, so-called “lühike sisseütlev”,
i.e. “short illative”, this is where it diverges from the English Wikipedia on the subject (or, rather, wikipedia diverges a bit).
Anyway, this “short illative case” does not have a formula. This way, we have established that Estonian language has 4 “real” cases, at least in singular: nominative, genitive, partitive, and “short illative”. All the rest can be, by and large, produced with a simple formula (minus some exceptions).
If we look at the plurals, it is for the most part the same situation:
Case | Suffix | Number of exceptions |
---|---|---|
Comitative | -ga | 0 |
Abessive | -ta | 95 |
Essive | -na | 0 |
Terminative | -ni | 148 |
Translative | -ks | 74 556 |
Wow, there clearly is something going on there! Let’s have a look at those massive discrepancies for a plural translative case:
Word | Base form | Translation | Actual case form | Formula-based form |
---|---|---|---|---|
aabits | aabitsate | alphabet | aabitsaiks | aabitsateks |
äädikas | äädikate | vinegar | äädikaiks | äädikateks |
aadel | aadlite | nobility | aadleiks | aadliteks |
and so on…
So, what we’re dealing with here is the so-called ghost form: aabitsai
/ äädikai
/aadlei
which is not given in
dictionaries, but is used as a basic form for some words for forming some of the cases for plural number.
Now, how prevalent is it? Well, in my database, there’s 85,261
name paradigms, and 74,556
of those seem to use this
plural root
shadow form (for translative at least), which is ~87%.
Let’s see how many discrepancies other cases have:
Case | Suffix | Number of exceptions | |
---|---|---|---|
Comitative | -ga | 0 | |
Abessive | -ta | 95 | |
Essive | -na | 0 | |
Terminative | -ni | 148 | |
Translative | -ks | 74 556 | |
Ablative | -lt | 74 556 | |
Adessive | -l | 74 556 | |
Allative | -le | 74 556 | |
Elative | -st | 74 556 | |
Inessive | -s | 74 556 | |
Illative | -sse | 74 556 |
Surprising, even suspicious, uniformity. Let’s have a look at, say, translative case forms and try to extract a plural root
form out of those.
N days later…
Now… once we have identified the mysterious, shadow “root plural” form, let’s build a list of discrepancies while taking also these forms into account.
What do we have for translative case: 0. Cool
Let’s have a look at (plural) ablative case now. 0 discrepancies again.
Ok, let’s compose a table now, with shadow “root-plural” form included as a base for producing suffixed forms:
Case | Suffix | Number of exceptions | |
---|---|---|---|
Translative | -ks | 0 | |
Ablative | -lt | 0 | |
Adessive | -l | 0 | |
Allative | -le | 0 | |
Elative | -st | 0 | |
Inessive | -s | 0 | |
Illative | -sse | 0 |
So, basically, (almost) all the discrepancies we’ve seen above, for the plural forms, can be explained by “root plural” form, which is defined on ~87% of all lexemes.
Now let’s get back to the small number of discrepancies for abessive and terminative cases that we have seen above.
For abessive case these are:
nominal | base | case | inflected | suffixed |
---|---|---|---|---|
ebajalg | ebajalge | genitive | ebajaluta | ebajalgeta |
ebajalg | ebajalgade | genitive | ebajaluta | ebajalgadeta |
eesjalg | eesjalge | genitive | eesjaluta | eesjalgeta |
eesjalg | eesjalgade | genitive | eesjaluta | eesjalgadeta |
esijalg | esijalge | genitive | esijaluta | esijalgeta |
esijalg | esijalgade | genitive | esijaluta | esijalgadeta |
… and so on, but all of these are for lexemes, that have jalg
(leg) as it’s root. And we can see that *jalu
is the shadow
“root plural” form for these words. So, the exception to memorize here is that Xjalg
also uses shadow “root plural”
form for this case, unlike the other words, that only use plural genitive
form for this case.
And we’ve also seen discrepancies for the terminative case, let’s look at those:
nominal | meaning | base | case | inflected | suffixed |
---|---|---|---|---|---|
kõrv | basket | kõrvade | genitive | kõrvuni | kõrvadeni |
põlv | knee | põlvede | genitive | põlvini | põlvedeni |
rind | breast | rindade | genitive | rinnuni | rindadeni |
rind | breast | rinde | genitive | rinnuni | rindeni |
silm | eye | silmade | genitive | silmini | silmadeni |
silm | eye | silme | genitive | silmini | silmeni |
And a lot of other words, all based on one of these roots. So, another exception to remember is that words, based on one of these roots, also use it’s shadow “root plural” form to build a terminative case.
Let’s also quickly confirm probably the first rule one ever learns about Estonian morphology:
nominative plural = genitive singular +
-d
My teacher even used to say:
Don’t memorize the second form (a.k.a the genitive), memorize the plural instead! You’ll get two for the price of one.
Oh wow, there are some exceptions indeed. But all of them are pronouns, which are irregular anyway, and declension of pronouns just has to be remembered.
And finally, let’s summarize what we have learned about oh-so-scary Estonian case system:
Declension of names (no pronouns)
Case (en.) | Case (est.) | Singular | Plural |
---|---|---|---|
Nominative | Nimetav | the word as in dictionary | singular genitive + d |
Genitive | Omastav | has to be memorized | has to be memorized |
Partitive | Osastav | has to be memorized | has to be memorized |
Short illative | Lühike sisseütlev | has to be memorized | has to be memorized |
Illative | Sisseütlev | singular genitive + -sse |
plural genitive + -sse or root plural + -sse |
Inessive | Seesütlev | singular genitive + -s |
plural genitive + -s or root plural + -s |
Elative | Seestütlev | singular genitive + -st *except for words, ending with kodu - these will have -nt ending |
plural genitive + -st or root plural + -st |
Allative | Alaleütlev | singular genitive + -le |
plural genitive + -le or root plural + -le |
Adessive | Alalütlev | singular genitive + -l |
plural genitive + -l or root plural + -l |
Ablative | Alaltütlev | singular genitive + -lt |
plural genitive + -lt or root plural + -lt |
Translative | Saav | singular genitive + -ks |
plural genitive + -ks or root plural + -ks |
Terminative | Rajav | singular genitive + -ni |
plural genitive + -ni *except for words based on silm , põlv , kõrv , rind - these will also have root plural + -ni |
Essive | Olev | singular genitive + -na |
plural genitive + -na |
Abessive | Ilmaütlev | singular genitive + -ta |
plural genitive + -ta *except for words based on root jalg - these will also have root plural + -ta |
Comitative | Kaasütlev | singular genitive + -ga |
plural genitive + -ga |
So, as we can see: out of 30 possible case forms, only 8 have to be memorized, all the rest can be easily produced by simply adding a well known suffix to one of three basic forms. And there’s just 6 roots, that produce certain exceptions.
I hope this post helps de-mystify the subject and bring down your fear of learning Estonian!!
Palju õnne, ja ruttu nägemiseni! (which means “Good luck, and see you soon!”)