• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Book chapter
The Parallel Corpus of Russian and Ruska Romani Languages

Kirill Koncha, Abina Kukanova, Kazakova Tatiana et al.

In bk.: Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024). Bangkok: Association for Computational Linguistics, 2024. Ch. 1. P. 1-5.

Working paper
Exploring the Effectiveness of Methods for Persona Extraction
In press

Konstantin Zaitsev.

arxiv.org. Computer Science. Cornell University, 2024

"Students at HSE have a good sense of linguistic diversity"

Yale postdoc Kevin Tang recently gave a talk at HSE on his research in experimental phonology. We talked to Kevin about his conversion from an engineer to a linguist and asked him how he liked the feedback he received from HSE students.

I know that before becoming a linguist you studied engineering in Cambridge and actually got a degree in engineering.  How did you switch to studying language after that?

During the engineering course I took lessons in Greek dancing. And the tutor who taught the Greek dancing was a PhD in phonetics. This is how I came across phonetics…

So it was pure chance.

Yes, pure chance. I googled his name, I ‘stalked’ him a bit, read some of his papers. And I thought ‘oh, this thing is surprisingly scientific’! Coming as an engineer, I never even heard of phonetics. It is not a common subject in the UK. People don’t know what linguistics is, if you asked, they’d say it is about learning languages.

Same in Russia, actually. So, what happened next?

Reading these papers sparked my initial interest. During my degree, just as a hobby I’d go to the library, take out books on linguistics and study them myself. And by the end of my engineering degree I thought that I actually preferred linguistics to engineering. I still deed a master’s in engineering. And then I was advised to go to either Edinburgh or University College London to do one year ‘conversion master’s course’, which allows you to start from the basics. It’s a one-year course, and they do it only in few universities. That was about 7 years ago.

So where did you go?

I decided to go to UCL and did the master’s there. It was to see whether I had enough interest to pursue linguistics as a career. You never know until you do it. Then I met my professor, who would become my linguistics PhD mentor. He told me that he could try to get me onboard with the PhD program, and if I were good enough, the governor would fund me to do the program, and this would indicate that I could extend this as a career. So the initial spark was Greek dancing, then self-study of phonetics, and then UCL.

So you stayed at UCL to get your PhD?

Yes. I got my PhD at UCL. And at the moment I am doing a postdoc at Yale.

Studying Mayan languages?

I wanted to do something on languages that are not so commonly studied. If you are studying Spanish, well, everybody else is studying Spanish! Studying rare languages is good for career and for your own language development.

I was just going to ask about that. In your lecture you said that we should actually strive for typological diversity of our studies. Could you elaborate a bit? Why is it important not to study one particular language family.

Obviously typology is the strength of Vyshka, of the school of linguistics here. I guess the advantage of it is you wont overfit your linguistic model. Let’s say you’re doing the entire set of theoretical development on a handful of languages… Which is actually what’s happening. In most of the more advanced experimental areas in Western Europe they work on 6 languages maximum. Among them English, Spanish, French, and the most ‘exotic’ one is Japanese. Because they have the money to do the research. What happens next is you build some theoretical model and you think your model can fit the languages. But you are only fitting several languages.

Most of them being descendants of Latin.

Exactly. So it’s a typical overfitting issue, just like in machine learning. If you overfit your model, it makes bad predictions when it encounters new data that it hasn’t seen before. So by tackling typologically different languages (and I mean very, very different), and especially understudied languages, you hopefully avoid that. And your model has more explanatory power.

And as we know, understudied languages tend to die pretty quickly. So the more they die out, the fewer data points we have to develop our models. So our model is more likely to overfit, as each language dies out.

Coming back to your talk, how did you like the feedback from our students? There was a pretty long discussion afterwards, as far as I remember.

Oh, yes, I was quite impressed by the questions. Students seem to have a good sense of linguistic diversity. Probably, because of their study program and fieldwork. They provided examples from lots of rare languages, understudied languages. Like ‘Oh, in this language there is this special pattern…’. I can’t even remember half of the names of the languages they mentioned. That was quite unusual, because I taught at UCL before and then at Yale, and students there do not tend to use the languages beyond their immediate reach as linguistic examples and counterexamples.  I think it is very healthy, when they try to fit whatever linguistic theory or model you throw at them into language X that they know.

Some of the questions were related more broadly to language processing side.  For instance, the questions that raised the issue of difference in processing function words and actual content words.  The students seem to have been exposed to the processing account of linguistics, as opposed to the theoretical account. I thought the focus was mainly computational or theoretical, but apparently they have experimental stuff as well.