• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Text Complexity Calculator for Low-Resource Languages

This tool makes it quick and easy to evaluate the complexity of a text in low-resource languages across several parameters: word length and frequency, lexical diversity, distribution of parts of speech, and most importantly — Flesch Reading Ease score, adapted specifically for each language.

What

Our tool shows the length of a text in characters, words, and sentences; the average sentence length in words; and the average word length in letters. It also provides information about word frequency in the text (how many times each word occurs) and word frequency in the language (ipm — items per million), based on language corpora, as well as the percentage of the text’s vocabulary that falls within the language’s list of 5,000 most frequent words.

The calculator estimates the distribution of parts of speech and, using morphological and lexical text information, provides more comprehensive text characteristics (e.g., text narrativity, text descriptiveness, lexical density, and lexical diversity). Finally, its key feature is the calculation of Flesch Reading Ease score, with coefficients adjusted individually for each language. A detailed description of the parameters is given below.

Why and for whom

This tool can be useful both in research and in education. For example, when creating stimulus materials for linguistic experiments, it is crucial to consider their complexity and comparability. For the first time, an accessible tool has been created for languages that are underrepresented in linguistics. The first version of the tool supports several minority languages of Russia, helping teachers of these languages select materials appropriate to the proficiency level of their students.

Parameters

Flesch Reading Ease score indicates the expected reading difficulty of a text. It is based on the original Flesch formula, which takes into account the number of words, sentences, and syllables, but in our tool the coefficients have been adapted for each language. For details on coefficients adaptation, see Petrunina & Zdorova (2025).

Lexical Diversity enables us to evaluate the proportion of unique words in a text. It is calculated as the ratio of the number of unique words (lemmas) to the total number of word forms in the text (N unique lemmas / N all word forms), giving a value from 0 to 1. A score of 1 means that all words are unique.

Lexical Density is calculated as the ratio of content words to function words on a scale from 0 to 10. Higher lexical density implies greater text complexity.

Text Narrativity is calculated as the ratio of verbs to nouns in a sentence, on a scale from 0 to 10. Higher values indicate more narration in the text (which makes them more “dynamic”). 

Text Descriptiveness is measured as the number of adjectives and participles per sentence, on a scale from 0 to 10. Higher values indicate a more descriptive text.

Distribution of Parts of Speech is calculated with external morphological parsers (a separate one for each language). Due to linguistic features and differences in approaches to identifying parts of speech in various languages, the number of parts of speech and the level of detail in the part-of-speech tags vary. Our tool features the parts of speech specified in the original language parsers, however, the tag names have been standardized based on the universal system of notation Universal POS tags.

For example, in Udmurt and Adyghe, unlike other languages, pronouns (PRON) are subdivided into adjectival and nominal types, marked as PRON-Adj and PRON-Noun.

The correspondence of the POS tags from each parser to the tags in our tool can be found here.

Rare Words are words not covered by the lexical minimum and frequency dictionary.


 

Citation

For citation please use:

Petrunina, U., & Zdorova, N. (2025, April). Readability assessment of written Adyghe using a baseline approach. In Proceedings of the International Conference “Dialogue” (Vol. 2025).


 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.