Formal approaches to information structure: syntax, semantics, discourse

Priority areas of development: humanitarian
Department: Laboratory of Formal Models in Linguistics
The project has been carried out as part of the HSE Program of Fundamental Studies.

Goal of research

The goal of the project is to develop formal approaches to information structure and to analyze how it interacts with various grammatical phenomena; to describe the phenomena of interest using contemporary formal theories of syntax, semantics and pragmatics.

The specific tasks of the project include the broadening of the empirical base of the formal theories that exist in the field, which can lead, among other things, to revising these theories. In particular: (a) considering evidence from Russian, a language with "free" word order, to inform modern theories of scrambling; (b) considering evidence from typologically and genetically diverse languages, for which any paradigmatic data on the grammaticality of certain syntactic configurations, crucial for modern linguistic theories, has been absent so far; (c) developing experimental designs for the study of interaction of information structure with word order and prosody; (d) developing the methodology of corpus studies to be able to test the hypotheses already made in formal frameworks; (e) suggesting analyses of newly obtained data highlighting the advantages and disadvantages of the existing theoretical approaches. Among the topics that our research focuses upon are: word order and intonation; grammatical encoding of arguments; syntax and semantics of focus-sensitive items; interaction of information structure and scope of scope-taking elements; the connections between broadly understood pragmatic factors with grammatical encoding, etc.


The project uses the following methods to work with linguistic material:

  • elicitation, including its field linguistics variety, i.e. asking a representative sample of native speakers to make grammaticality judgements;
  • corpus methods (searching for relevant utterances in a representative text corpus);
  • other experimental techniques to test different aspects of sentence production and comprehension.

Empirical base of research

The empirical base of the research is the linguistic data obtained by means of elicitation (speakers’ grammaticality judgements), including data collected in the field, as well as the results of linguistic experiments, and corpus data.

Experimental studies: experimental research of production and comprehension of various syntactic constructions in a particular discourse context. The main method used in the study of production is the method of questionnaires: participants were invited to read mini-texts and supplement them with sentences using the words in the brackets provided below, and then read the resulting text aloud. This is the way to learn which word order and prosodic characteristics are preferred depending on the informational structure of the sentence determined by the context. In order to study comprehension, participants were asked to evaluate sentences with different word orders and prosodic properties depending on the context (i.e., to evaluate how naturally a given mini-text sounds as a whole on the scale of 1 to 5 or 1 to 10). Another method is to ask participants to choose a question, to which a given sentence could be an appropriate answer. This way we can learn what information structures are acceptable for a given sentence.

Field data: the field data were collected using specially designed paradigms of stimuli; the examples were either presented to native speakers in an intermediary language or generated by the researcher in the target language (and evaluated by the speaker). Field studies were carried out in the areas where the following languages are spoken: Hill Mari, Khanty, Beserman Udmurt (Finno-Ugric), Balkar, Chuvash (Turkic), Buryat (Mongolian) and Chukchi (Chukchi-Kamchatkan) languages. We collected corpora of examples on the topics of interest, basing ourselves on typological questionnaires and theoretically-oriented works.

Corpus data: to collect the data, the Russian National Corpus as well as text corpora in of several minority languages were used.

Results of research

The main theoretical results obtained so far are listed below.

It is known that in languages ​​with the so-called “free” constituent order, word order variations are used to express information structure. However, information structure is not the only thing that determines such variation. We studied a number of word order alternations in Russian (OV / VO, SVO / VSO). We identified information-structural properties of the direct objects that occupy a non-standard preverbal position, as well as the main factors provoking the emergence of this marked word order. Among them we can list the pronominalization of the direct object, and factors causing a difference in the communicative status of the verb and the DO. We demonstrated that formal analyses previously proposed for the SOV and VSO orders are empirically wrong. Instead, we proposed a feature-based analysis of the movements which generate the VSO order, and a novel perspective on the SOV order. Also, for a construction with an adjunct moved to the left periphery of the sentence, it was shown that such a configuration strongly correlates with the type of prosodic accent. We also conducted experiments aimed to study syntactic and prosodic means of expressing the information structure in the Russian language.

The phenomenon of indirect control was investigated, which had not been previously studied in the Russian language. The results have profound implications for the theories of syntactic control. The mechanisms of “association with focus” in the Russian language were analyzed. In the previously studied languages, focus particles most often occupy a fixed position in clause. In Russian, on the contrary, such units almost always are located next to the focal component.

We also studied a number of phenomena related to the variability of grammatical coding and to information structure in languages with various typological characteristics. The variable coding of the subject in relative clauses in the Khanty language was considered; some of the identified limitations have an information-structural explanation. In addition, the interaction of the linear position and the information structure in prospective sentences in English was studied, and a new theoretical explanation of the difference between the ‘going to’ and ‘about to’ constructions was proposed.

Studies were carried out on the scope of quantitative nouns, as well as the interaction of the scope of negation and the quantifier ‘all’ in Russian. It was found that the scope of quantitative nouns in Russian is associated with the agreement of the verb in number with a quantitative group in the position of the subject. Factors affecting the interaction of the scope of negation and universal quantifier expressions were also identified: the presence of elements of negative polarity, the information structure of the sentence, the case of the quantifier in the object position, the use of some adverbs, the reference status of the noun phrase with a quantifier.

The semantics and syntax of a number of elements that are sensitive to the information structure status (focus-sensitive elements) were investigated. In particular, restrictive focus particles in the Indonesian language were studied: their use was described and their compatibility with various types of constituents was examined. The semantics of the scalar additive particle ‘esche’ was investigated in Russian, and a compositional analysis was proposed within the framework of alternative semantics. An analysis was proposed for the semantics of numerical expressions in the Buryat language, also based on alternative semantics. Finally, the distribution of the focal particle ‘da’ in the Karachai-Balkar language was investigated.

As part of the study of the reference properties of linguistic expressions, the behavior of the discursive expression of possessiveness in the Khanty language was investigated. Arguments were made in favor of the fact that this expression has two meanings: possessiveness with extended semantics and certainty.

A number of other studies were conducted on the material of languages ​​of different genetic origin. In particular, the syntax and semantics of interrogative constructions in the Chukchi language were investigated, and a number of non-trivial phenomena in this area were described. We studied the variability in the interpretation of Russian temporary adverbs and found the dependence of this variability on the tense of the verb in the main and dependent clauses. A typological study of expressions combining comparative and attenuative semantics was carried out, and a number of typological generalizations were advanced. During the study of the coordination mechanism in the Turkic and Mongolian languages, it was shown that the lack of syntactic coordination does not always mean the absence of semantic coordination. Finally, complex verb constructions in the Hill Mari and Chuvash languages ​​were investigated: for the first time, the semantics of such constructions was described in detail.

In the field of methodology development, the following results were obtained.

Firstly, in the study of the SOV order (the preverbal position of the direct object), corpus analysis was applied. Data for the analysis were obtained from the recently developed Taiga corpus (https://tatianashavrina.github.io/taiga_site/), which offers new analytic tools and new types of texts for the research. The material was annotated for a number of factors (type of linguistic expression (for example, pronominal or not), type of rhema, reference status of the noun group, etc.). Using various statistical criteria, we found that the main tendency of choosing this word order is related to the pronominalization of the direct object.

Secondly, in the study of the order with the initial position of the verb in the sentence (VSO), the grammaticality judgment approach was applied. Also, the main method was to test different theoretical approaches to the word order on the basis of Russian language data.

Thirdly, an experimental approach was applied in the study of the circumstantial modifiers (adverbs and prepositional phrases) moved to the beginning of the sentence.

The design of the experiments was thoroughly worked out, allowing us to study the interaction of the information structure with intonation. In particular, a mini-dialogue based design was developed. The participants were offered as stimuli a pair of sentences from the same dialogue or from two different dialogues (presupposing different information structure).

To carry out the tasks of the project during field research, data were collected from structurally different languages ​​(Indonesian, Chukotka, Mountain-Mari, Khanty, Besermyansk, Balkar). In addition to collecting spontaneous texts, questionnaires were conducted based on the translation of specially selected examples. The questionnaires included paradigms of examples, testing their grammatical correctness allows us to identify and prove certain hypotheses about the formal structure of the phenomenon under study.


