Moscow, 21/4 Staraya Basmannaya Ulitsa
Phone: +7 (495) 772-95-90 *22734
This paper surveys relative clause constructions in West Circassian (Adyghe) and Kabardian.
Head/dependent marking is a typological parameter based on whether syntactic relations, or dependencies, are marked on the head of the relation, on the non-head, on both, on neither, or elsewhere in the constituent. It has been visible in description and comparison for some thirty years, during which time advances in analysis of phrase structure and descriptions of previously unnoticed patterns have revealed some imprecisions and gaps in the typology. That approach has figured in descriptive and theoretical work of various kinds and has proven quite useful as far as it goes, but the expansion of descriptive and theoretical work on morphosyntax in the subsequent decades has revealed some gaps and inconsistencies in the original formulation. These can be removed by allowing markers to be assigned not to words but to entire phrases, a move that also allows detached and neutral marking to be more comfortably accommodated in locus theory.
This article presents a survey of the morphology of highly polysynthetic Northwest Caucasian languages.
Northwest Caucasian languages display a high degree of polysynthesis (manifested in complex words which bear much information on arguments and the characteristics of a situation), prefixes and suffixes, with some morphemes being capable to appear both as prefixes and suffixes, ergative-based cross-reference of core arguments and indirect objects introduced by applicatives, highly developed means of expressing locational semantics within the predicate, and intricate tense-modality-aspect systems. Although classical noun-to-verb incorporation does not occur, there are constructions akin to incorporation, especially in the nominal domain. Nouns constitute a subclass of a broad class of predicates (both morphologically and syntactically) and form word-like nominal complexes with their attributes. Morphemes demonstrate features which are not typical of morphemes in Standard Average European languages, including much autonomy reflected in affix order variation and ability to attach to complex syntactic constituents.
The volume is devoted to the typology of the category of number in the world's languages.
This paper describes the range of patterns used for the expression of ‘other’ in East Caucasian (Nakh-Daghestanian) languages, an indigenous language family of the Eastern Caucasus mainly spoken in the Republics of Daghestan, Chechnya and Ingushetia (Russian Federation), as well as in northern regions of Azerbaijan and eastern parts of Georgia.
Vossian Antonomasia is a prolific stylistic device, in use since antiquity. It can compress the introduction or description of a person or another named entity into a terse, poignant formulation and can best be explained by an example: When Norwegian world champion Magnus Carlsen is described as "the Mozart of chess", it is Vossian Antonomasia we are dealing with. The pattern is simple: A source (Mozart) is used to describe a target (Magnus Carlsen), the transfer of meaning is reached via a modifier ("of chess"). This phenomenon has been discussed before (as 'metaphorical antonomasia' or, with special focus on the source object, as 'paragons'), but no corpus-based approach has been undertaken as yet to explore its breadth and variety. We are looking into a full-text newspaper corpus (The New York Times, 1987–2007) and describe a new method for the automatic extraction of Vossian Antonomasia based on Wikidata entities. Our analysis offers new insights into the occurrence of popular paragons and their distribution.
This chapter presents an overview of the Northwest Caucasian (West Caucasian, Abkhaz-Adyghe) family.
The study examines tense variation in complement and subject clauses subordinate to and co-temporal with matrix past tense verbs in Russian. The semantics of the matrix verb is commonly named as one of the major factors that govern tense choice in complement and subject clauses: verbs of speech are said to exclusively license present tense in embedded clause; existential verbs, on the other hand, are said to block present tense; whereas verbs of perception are said to allow both past and present tense, cf. Vanja skazal, čto Maša xorošo vygljadit ‘Vanya said that Masha looked [pres] great’ vs. Slučalos’, čto Maša vygljadela xorošo ‘It happened (there were times) that Masha looked [past] great’ vs. Vanja videl, čto Maša xorošo vygljadit/vygljadela ‘Vanya saw that Masha looked [pres/past] great’. However, despite a considerable body of research on the topic, a comprehensive investigation of tense distribution across various semantic classes of matrix verbs has not yet been undertaken. This paper presents a corpus-based analysis of tense distribution in complement and subject clauses across five semantic classes of the matrix verb: speech, mental, emotion, perception, and existential. Statistical analysis revealed the following probabilistic hierarchy of licensing past tense: [existential verbs (97%)> verbs of perception + complementizer kak (70%)> verbs of perception + complementizer čto (41%)> mental verbs (9%)> verbs of speech (1%)]. This hierarchy rectifies our notion of tense choice in complement and subject clauses in Russian. It is also notable for its high correspondence with the interclausal bondedness hierarchy maintained in typological studies. The suggested isomorphism of the two hierarchies implies that tense appears to be a probabilistic marker of interclausal bondedness, with the absolute tense encoding closer and the relative looser relations.
Kabardian (Northwest Caucasian) displays two associative plural constructions. The first pattern exploits the suffix which is also used for the expression of additive plural: it is added to proper names and normally provides a reference to the family of the focal referent. Within the second pattern, a specific associative plural marker follows a syntactically autonomous nominal. The latter pattern possesses several specific properties: the associative plural marker governs the case of the focal nominal, which can be represented even by inanimate, non-specific and coordinate NPs. To describe the Kabardian associative plural system, we would suggest using not only a simplified version of Animacy Hierarchy (as is often done in typological literature) but involving several other hierarchies including those of definiteness/referentiality, number individuation, and morphosyntactic autonomy.
Attributing a particular property to a person by naming another person, who is typically wellknown for the respective property, is called a Vossian Antonomasia (VA). This subtpye of metonymy, which overlaps with metaphor, has a specific syntax and is especially frequent in journalistic texts. While identifying Vossian Antonomasia is of particular interest in the study of stylistics, it is also a source of errors in relation and fact extraction as an explicitly mentioned entity occurs only metaphorically and should not be associated with respective contexts. Despite rather simple syntactic variations, the automatic extraction of VA was never addressed as yet since it requires a deeper semantic understanding of mentioned entities and underlying relations. In this paper, we propose a first method for the extraction of VAs that works completely automatically. Our approaches use named entity recognition, distant supervision based on Wikidata, and a bi-directional LSTM for postprocessing. The evaluation on 1.8 million articles of the New York Times corpus shows that our approach significantly outperforms the only existing semi-automatic approach for VA identification by more than 30 percentage points in precision.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as a state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we suggest a method to compile the gold standard datasets for the Russian poetry. Secondly, we focus on the taggers’ performance in the identification of the part of speech tags and lemmas. We reveal what kind of POS classes, paradigm classes and syntactic patterns mostly affect the quality of processing.
This paper discusses novel facts regarding adpositional agreement in Avar in light of recent theories of feature valuation. I show that the traditional notion of downward Agree/upward valuation is sufficient to account for the observed facts, rendering the competing mechanism of upward Agree/downward valuation superfluous.
The 2019 Shared Task on Automatic Gapping Resolution for Russian (AGRR2019) aims to tackle non-trivial linguistic phenomenon, gapping, that occurs in coordinated structures and elides a repeated predicate, typically from the second clause. In this paper we define the task and evaluation metrics, provide detailed information on data preparation, annotation schemes and methodology, analyze the results and describe different approaches of the participating solutions.
The way researchers in the arts and humanities disciplines work has changed significantly. Research can no longer be done in isolation as an increasing number of digital tools and certain types of knowledge are required to deal with research material. Research questions are scaled up and we see the emergence of new infrastructures to address this change. The DigitAl Research Infrastructure for the Arts and Humanities (DARIAH) is an open international network of researchers within the arts and humanities community, which revolves around the exchange of experiences and the sharing of expertise and resources. These resources comprise not only of digitised material, but also a wide variety of born-digital data, services and software, tools, learning and teaching materials. The sustaining, sharing and reuse of resources involves many different parties and stakeholders and is influenced by a multitude of factors in which research infrastructures play a pivotal role. This article describes how DARIAH tries to meet the requirements of researchers from a broad range of disciplines within the arts and humanities that work with (born-)digital research data. It details approaches situated in specific national contexts in an otherwise large heterogeneous international scenario and gives an overview of ongoing efforts towards a convergence of social and technical aspects.