The first open text recognition model for Latvian handwriting has been created on the Transkribus platform. Developed for the research and digitisation of Latvian handwritten manuscripts, the model will be useful to anyone working with 19th-century Latvian handwritten materials.
The model was trained using materials from the collection of the Scientific Commission of the Riga Latvian Society, preserved at the Archives of Latvian Folklore (ALF, Institute of Literature, Folklore and Art, University of Latvia). This is the oldest and most extensive collection at ALF, containing unique manuscripts that cover various folklore genres, ethnographic records, linguistic materials, place-name documentation, songs, riddles, word explanations, and other testimonies of traditional culture, dialects, and Latvian cultural history.
The model was trained using artificial intelligence technologies and previously prepared manuscript transcriptions produced by the ALF volunteer community. The character error rate (CER) of the model published in Transkribus is 4.83%; it was trained on 2,671 pages, covering more than 367,000 words and 132,000 lines of text.
The model was developed as part of the University of Latvia-funded project ȬPEN: Open Knowledge Ecosystems for the Advancement of Citizen Science (ZDA-LIP 2025/2). Its development brings together the University of Latvia Digital Humanities Centre, the University of Latvia Library, Transkribus, and the Institute of Literature, Folklore and Art of the University of Latvia Archives of Latvian Folklore.
“This model is an important step towards expanding access to Latvia’s handwritten heritage. It not only accelerates manuscript transcription, but also opens up new opportunities for research, the creation of digital collections, and public participation in exploring cultural heritage. It is especially important that the model is relatively open. All registered Transkribus users can use it in their own projects and continue improving it,” says project leader and head of the University of Latvia Digital Humanities Centre Sanita Reinsone.
Work on the project is continuing with the development of a text recognition model for 20th-century Latvian handwriting, which will further expand the possibilities for automated recognition of Latvian handwritten materials. In addition, a new tool is currently being developed to make it easy for anyone to try handwritten text transcription and image description using the best-known large language models.
The model is available on the Transkribus platform under the name “Latvian 19th century”: