For the research and digitisation of Latvian handwritten manuscripts, the first open text recognition model has been created on the Transkribus platform, which will be useful for anyone working with 19th‑century Latvian handwritten manuscripts.
The model was trained using materials from the collection of the Knowledge Commission of the Riga Latvian Society, which are stored in the Latvian Folklore Repository (LFK) of the Institute of Literature, Folklore and Art (LFMI) of the University of Latvia (UL). This is the oldest and most extensive collection of the LFK, containing unique manuscripts that cover various folklore genres, ethnographic information, language materials, records of place names, explanations of words, and other evidence of traditional culture, dialects, and the cultural history of Latvia.
The model was trained using artificial intelligence technologies and previously prepared manuscript transcriptions created by the LFK volunteer contributor community. The character error rate (CER) of the model published on the Transkribus platform is 4.83%. It was trained using 2,671 pages of text, covering more than 367,000 words and 132,000 lines of text.
The model was developed within the LU‑funded project ȬPEN, or “Open Knowledge Ecosystems for the Development of Citizen Science” (ZDA‑LIP 2025/2). Its development involves cooperation between the Digital Humanities Centre of the Faculty of Humanities (HZF) of the University of Latvia, the University of Latvia Library, Transkribus, and the Latvian Folklore Repository of the LU Institute of Literature, Folklore and Art (LFMI).
“This model is an important step in expanding the accessibility of Latvian handwritten heritage. It not only accelerates the transcription of manuscripts but also opens up new opportunities for research, the creation of digital collections, and public engagement in exploring cultural heritage. It is particularly important that the model is relatively open — any registered Transkribus user can use it in their projects and continue to improve it,” emphasizes the project leader and Head of the Digital Humanities Centre at the UL Faculty of Humanities, Associate Professor Sanita Reinsone.
Work on the project continues with the development of a text recognition model for 20th‑century Latvian handwritten manuscripts, which will further expand the possibilities of automated recognition of Latvian handwritten text.
The model is available on the Transkribus platform under the name “Latvian 19th century”.