How Do Latvian Words Emerge? LU Researchers Create a Unique Database

Andra Kalnača

July 3, 2026

research

Researchers from the University of Latvia’s Faculty of Humanities have developed a new and unique digital resource — the “Database of Latvian Morphemes and Word‑Formation Models (LVMVMD)”. It systematically compiles data on the structure and formation of Latvian words. The database is built on the analysis of more than 75,000 lemmas extracted from the Balanced Corpus of Modern Latvian (LVK2018), a comprehensive digital collection of contemporary Latvian texts.

The newly created resource can be useful not only for linguists – it helps analyse the development of the language, build corpora and dictionaries, improve machine translation, and develop artificial‑intelligence tools tailored for the Latvian language. It makes it possible to study how Latvian words are formed, how they are interconnected, and how the language changes over time.

What are morphemes?

Every Latvian word we use in everyday life consists of smaller units, or morphemes – the root, prefix, suffix and ending – each of which carries its own meaning. Metaphorically, one can say that they resemble the skeleton and bones in the anatomy of living beings.

The most important morpheme is the root, because it contains the meaning of the word.

Alongside the root, other morphemes may be added in various combinations, and words may also have more than one root. For example, the word “saule” consists of two morphemes – the root saul- and the ending -e, whereas in the word “saulīte” the suffix -īt- is added next to the root, so the word contains three morphemes: saul-īt-e.

When learning a language and its structure, we also acquire all morphemes, their combinations and the meanings they contain. Without this skill we would not be able to communicate, because it allows us to form the necessary expressions in each situation – to combine morphemes into words, arrange words into sentences, and create texts according to the purpose of communication.

By exploring morphemes, one can better understand not only the principles by which words are formed, but also how human thinking and language as a whole operate, and what associations, metaphors and metonymies underlie our linguistic and extralinguistic perception. For example, many Latvian speakers may be surprised that some berry and mushroom names with the suffix -en- are derived from animal and bird names, such as avene–avs, kazene–kaza, lācene–lācis, as well as cūcene–cūka, gailene–gailis.

The database reveals word relatedness

Each word in the newly created database is divided into morphemes and classified according to word‑formation models, making it possible to determine which word is primary, which is derived or compound, as well as the methods by which words are formed in Latvian.

The database also distinguishes homonyms (words that are pronounced and written the same but have different meanings, for example, dumpis meaning “waterfowl” and dumpis meaning “uprising”) and homographs (words that are written the same but pronounced differently and have different meanings, for example, zāle meaning “grass”, “herbs” and zāle meaning “large hall”), because they have different word‑formation models in the language.

Another benefit is the marking of borrowed words with a special indication, because the division into morphemes and word‑formation models usually does not coincide for inherited Latvian words and borrowings. Moreover, borrowed words often cannot be divided further, because the components of the original language differ from the elements of Latvian. For example, from the perspective of Latvian, for borrowed words such as “kupols”, “ingvers”, “panelis”, we can identify only the ending -s or -is, but not the root, prefix or suffix.

In the database, words are arranged in nests according to a common root, making it easy to trace their origin, see the formation of derivatives and compounds, and notice repeated word‑formation models and their meanings. If such a nest is based on a primary verb, then around it often cluster about one hundred different derivatives and compounds.

For example, historically related inherited roots, all of which are variants of the same original root – ved-, ves-, ve-, vez-, vež-, vad-, vaz-, važ- – correspond in Latvian to words such as vest, vedējs, pavediens, vedekla, vezums, vadīt, vadība, vadītājs, vads, novads, vazāt, važa, barvedis, tiesvedība, apvedceļš, asinsvads, vadlīnija, etc.

Useful not only for linguists

At present, anyone interested can access the new database in the GitHub repository, where it is also possible to learn about its creation principles, the Latvian language material included, and its classification. During 2026, a user manual for the database will be developed in Latvian and English. The resource has been created in accordance with international and modern standards for digital language resources.

The database will be useful not only for linguists, but also for computational linguists, translators, information‑technology specialists, artificial‑intelligence tool developers, corpus, database and dictionary compilers, Latvian language teachers and learners.

The new resource provides an important foundation for further data‑based research on Latvian grammar, word formation and other aspects, as well as for the development of various language‑learning and usage materials and manuals, since there is currently a lack of digital language resources in this field.

Without comprehensive research into the word‑formation system, it is not possible to fully understand other subsystems of the language – grammar, vocabulary, pragmatics, semantics and their use.

The article was prepared within the project “Database of Latvian Morphemes and Word‑Formation Models (LVMVMD)” (No. lzp‑2022/1‑0013) of the Fundamental and Applied Research Programme of the Latvian Council of Science. More information is available at https://www.dlmdm.lu.lv/

Riga Technical University Professor Kristaps Kļaviņš receives the prestigious “Deep Tech Pioneer” status at an international deep-tech summit

Kristaps Kļaviņš, Professor at the Faculty of Natural Sciences and Technology of Riga Technical University (RTU), under whose leadership the non-invasive diagnostic platform “SwyCard” is being developed, has gained significant international recognition at the deep-tech summit “Hello Tomorrow Global…

Rīgas Tehniskā universitāte

June 18, 2026

research

The 2nd International Congress of Transcultural Studies “Give and Take: Transdisciplinary Spaces of ‘Cohesive Netting’” to Take Place in Riga

The 2nd International Congress of Transcultural Studies, jointly organised by three European universities — the Latvian Academy of Culture in Riga, the University of Macerata (Università di Macerata) in Italy, and KU Leuven in Belgium — will take place in Riga from 30 June to 2 July 2026. This year…

Latvian Academy of Culture

June 11, 2026

research

From charging delays to seamless mobility – RTU researchers redefine battery use in electric transport

As electric mobility continues to expand rapidly across Europe, challenges such as long charging times, high costs, and sustainability concerns remain barriers to wider adoption. Researchers from Riga Technical University (RTU) are contributing to addressing these challenges by participating in the…

Riga Technical University

June 11, 2026

research natural sciences

How to replicate on Earth a process that occurs in the Sun? Researchers are working on future nuclear fusion technologies

Nuclear fusion is a process in which a large amount of energy is released when light atomic nuclei merge. This process also takes place inside the Sun. Although nuclear fusion is not yet used for commercial energy production, scientists around the world are working on its development, as in the fut…

Matīss Sondars (LU Eksakto zinātņu un tehnoloģiju fakultātes Ķīmiskās fizikas institūta pētnieks)

June 2, 2026

How Do Latvian Words Emerge? LU Researchers Create a Unique Database

What are morphemes?

The database reveals word relatedness

Useful not only for linguists

Recommended articles

Riga Technical University Professor Kristaps Kļaviņš receives the prestigious “Deep Tech Pioneer” status at an international deep-tech summit

The 2nd International Congress of Transcultural Studies “Give and Take: Transdisciplinary Spaces of ‘Cohesive Netting’” to Take Place in Riga

From charging delays to seamless mobility – RTU researchers redefine battery use in electric transport

How to replicate on Earth a process that occurs in the Sun? Researchers are working on future nuclear fusion technologies