Paragraphs
CLARIN LV.jpg

CLARIN-LV: key developments in 2025

Author
University of Latvia

January 29, 2026

The year 2025 marked an important milestone in the activities of CLARIN Latvia (CLARIN-LV), as it continued to expand and enhance its repository of language resources and tools. Throughout the year, CLARIN-LV actively introduced the CLARIN research infrastructure to students, academic staff, and researchers highlighting its value for research and innovation. CLARIN Latvia also strengthened national and international collaboration, fostered knowledge exchange within Latvian research community and CLARIN ERIC consortium.

To promote access to high-quality data for researchers in the humanities and social sciences, the CLARIN-LV repository was enriched with new digital language resources, including speech corpora, lexical databases, and dictionaries. The most viewed language resources from the repository  were Tēzaurs.lv (more than 1000 views per month), the Balanced Corpus of Modern Latvian (around 250 views per month), and the LATE Dev&Test Set for ASR (around 220 views per month). Significant contributions to the repository’s content were made by the DHELI and Language Technology Initiative projects. Although most language resources are open access, more than 120 users have registered in the CLARIN-LV repository—not only from Latvia, but also from the Netherlands, Iceland, Poland, Sweden, and other countries.

In cooperation with other members of the CLARIN ERIC consortium, the CLARIN Flagship Project PressMint was launched to compile a multilingual, comparable, annotated, translated and interoperable set of corpora of European historical newspapers from around the start of the 20th century. Two CLARIN-LV consortium members - the National Library of Latvia and the Institute of Mathematics and Computer Science of the University of Latvia – participates in this project. CLARIN-LV also became a member of the CLARIN Knowledge Centre on Large Language Models for the Humanities and Social Sciences (LLMs4SSH), established in 2025.

CLARIN infrastructure and language resources were introduced to the computer science students in the course “Fundamentals of Language Technologies” as well as to linguistics students in the course “Introduction to Computational Linguistics.” In December, CLARIN-LV organized a practical workshop for university teachers on the Digital Humanities course registry, where participants learned how to register courses.

CLARIN-LV consortium members actively participated in several events organized by CLARIN ERIC, including the CLARIN Annual Conference in Vienna, where they presented Latvian research and tools, and the CLARIN Café - SSH Research with CLARIN K-Centres: Expertise for Multimodal Data, Large Language Models and Discourse Analysis.

CLARIN is a distributed digital research infrastructure for language resources and tools, with participating centres all over Europe. Latvia joined CLARIN in 2016, and since 2018 it has been maintained through the State Research Programmes (e.g. DHELI, LATE), projects of the Latvian Council of Science, and other financial instruments. Since the autumn of 2025, CLARIN-LV is also supported by the ERDF project “University of Latvia and Institutes in the European Research Area – Excellence in Research and Cooperation.”