Russian Dialect Corpus: Ustja River Basin | The Center for East European and Russian/Eurasian Studies

Nina Dobrushina, Michael Daniel and Ruprecht von Waldenfels

HSE

The Ustya River Basin Corpus contains over 350 000 tokens of informant speech with new data being added continuously. It is based on interviews that were recorded during joint Russian-Swiss field trips in 2013 and 2014 with inhabitants of all age groups in the village Mixalevskaya of the Ustya region of Arxangelskaya Oblast and the neighboring villages Bestuževo, Plosskoe, Akičkin Počinok and Glubokoe.

The corpus interface allows searching for word forms, lexemes, and grammatical categories and makes meta data regarding the speaker such as age, place of birth and education available. After registration, users additionally have access to full texts.

Rather than being phonetically transcribed, the corpus is annotated with a representation in adapted standard Russian. This makes it possible to search the corpus and listen to those parts of the texts that are of interest, and, if necessary, do a more complete transcription.

Russian Dialect Corpus: Ustja River Basin