We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.
The School of Linguistics was founded in December 2014. Today, the School offers undergraduate and graduate programs in theoretical and computational linguistics. Linguistics as it is taught and researched at the School does not simply involve mastering foreign languages. Rather, it is the science of language and the methods of its modeling. Research groups in the School of Linguistics study typology, socio-linguistics and areal linguistics, corpus linguistics and lexicography, ancient languages and the history of languages. The School is also developing linguistic technologies and electronic resources: corpora, training simulators, dictionaries, thesauruses, and tools for digital storage and processing of written texts.
Bangkok: Association for Computational Linguistics, 2024.
Imbault C., Slioussar N., Ivanenko A. et al.
Plos One. 2024. Vol. 4. No. 4. P. 1-47.
Kirill Koncha, Abina Kukanova, Kazakova Tatiana et al.
In bk.: Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024). Bangkok: Association for Computational Linguistics, 2024. Ch. 1. P. 1-5.
Konstantin Zaitsev.
arxiv.org. Computer Science. Cornell University, 2024
I know that before becoming a linguist you studied engineering in Cambridge and actually got a degree in engineering. How did you switch to studying language after that?
During the engineering course I took lessons in Greek dancing. And the tutor who taught the Greek dancing was a PhD in phonetics. This is how I came across phonetics…
So it was pure chance.
Yes, pure chance. I googled his name, I ‘stalked’ him a bit, read some of his papers. And I thought ‘oh, this thing is surprisingly scientific’! Coming as an engineer, I never even heard of phonetics. It is not a common subject in the UK. People don’t know what linguistics is, if you asked, they’d say it is about learning languages.
Same in Russia, actually. So, what happened next?
Reading these papers sparked my initial interest. During my degree, just as a hobby I’d go to the library, take out books on linguistics and study them myself. And by the end of my engineering degree I thought that I actually preferred linguistics to engineering. I still deed a master’s in engineering. And then I was advised to go to either Edinburgh or University College London to do one year ‘conversion master’s course’, which allows you to start from the basics. It’s a one-year course, and they do it only in few universities. That was about 7 years ago.
So where did you go?
I decided to go to UCL and did the master’s there. It was to see whether I had enough interest to pursue linguistics as a career. You never know until you do it. Then I met my professor, who would become my linguistics PhD mentor. He told me that he could try to get me onboard with the PhD program, and if I were good enough, the governor would fund me to do the program, and this would indicate that I could extend this as a career. So the initial spark was Greek dancing, then self-study of phonetics, and then UCL.
So you stayed at UCL to get your PhD?
Yes. I got my PhD at UCL. And at the moment I am doing a postdoc at Yale.
Studying Mayan languages?
I wanted to do something on languages that are not so commonly studied. If you are studying Spanish, well, everybody else is studying Spanish! Studying rare languages is good for career and for your own language development.
I was just going to ask about that. In your lecture you said that we should actually strive for typological diversity of our studies. Could you elaborate a bit? Why is it important not to study one particular language family.
Obviously typology is the strength of Vyshka, of the school of linguistics here. I guess the advantage of it is you wont overfit your linguistic model. Let’s say you’re doing the entire set of theoretical development on a handful of languages… Which is actually what’s happening. In most of the more advanced experimental areas in Western Europe they work on 6 languages maximum. Among them English, Spanish, French, and the most ‘exotic’ one is Japanese. Because they have the money to do the research. What happens next is you build some theoretical model and you think your model can fit the languages. But you are only fitting several languages.
Most of them being descendants of Latin.
Exactly. So it’s a typical overfitting issue, just like in machine learning. If you overfit your model, it makes bad predictions when it encounters new data that it hasn’t seen before. So by tackling typologically different languages (and I mean very, very different), and especially understudied languages, you hopefully avoid that. And your model has more explanatory power.
And as we know, understudied languages tend to die pretty quickly. So the more they die out, the fewer data points we have to develop our models. So our model is more likely to overfit, as each language dies out.
Coming back to your talk, how did you like the feedback from our students? There was a pretty long discussion afterwards, as far as I remember.
Oh, yes, I was quite impressed by the questions. Students seem to have a good sense of linguistic diversity. Probably, because of their study program and fieldwork. They provided examples from lots of rare languages, understudied languages. Like ‘Oh, in this language there is this special pattern…’. I can’t even remember half of the names of the languages they mentioned. That was quite unusual, because I taught at UCL before and then at Yale, and students there do not tend to use the languages beyond their immediate reach as linguistic examples and counterexamples. I think it is very healthy, when they try to fit whatever linguistic theory or model you throw at them into language X that they know.
Some of the questions were related more broadly to language processing side. For instance, the questions that raised the issue of difference in processing function words and actual content words. The students seem to have been exposed to the processing account of linguistics, as opposed to the theoretical account. I thought the focus was mainly computational or theoretical, but apparently they have experimental stuff as well.