• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
  • HSE University
  • News
  • HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education

HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education

HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education

© iStock

Researchers at HSE University have proposed a novel approach to assessing AI's competency in educational settings. The approach is grounded in psychometric principles and has been empirically tested using the GPT-4 model. This marks the first step in evaluating the true readiness of generative models to serve as assistants for teachers or students. The results have been published in arXiv.

Each year, artificial intelligence plays a progressively larger role in education, prompting developers to address crucial questions about how to assess AI's capabilities, particularly in the context of its role in teaching and learning. Researchers at HSE University have introduced a novel psychometrics-based approach to creating effective benchmarks for evaluating the professional competencies of large language models (LLM), such as GPT. The approach is based on Bloom's taxonomy, which, despite the availability of numerous benchmarks (tests for language models), is not widely used specifically for result verification. 

A distinctive feature of the proposed methodology is its comparison of tasks across different levels of complexity—ranging from basic (knowledge) to advanced (application of knowledge) and addressing these varying levels in task evaluation. This is essential for assessing the quality of the model's recommendations across diverse situations and determining the extent to which it can be trusted in the educational context. As part of the study, the researchers developed and tested over 3,900 unique assignments, categorised into 16 content areas, including teaching methods, educational psychology, and classroom management. The experiment was conducted using the Russian language version of the GPT-4 model.

Elena Kardanova

'We have developed a new approach that goes beyond conventional testing,' explains Elena Kardanova, lead author of the project and Academic Supervisor at the Centre for Psychometrics and Measurement in Education of the HSE Institute of Education. Our approach is demonstrated through a comprehensive new benchmark—which is the term for language model tests—designed for AI in pedagogy. This benchmark is grounded in psychometric principles and emphasises key competencies essential for teaching. 

Today's AI models, such as ChatGPT, possess an impressive ability to process and generate text quickly, making them potential assistants in educational settings. However, our results indicate that the model struggles with more complex tasks that require a deeper understanding and the ability to think adaptively. For example, AI excels at retrieving known facts but demonstrates lower proficiency in applying this information to address real-world pedagogical challenges. In particular, ChatGPT is not always successful in solving theoretical problems, which can sometimes appear basic even to average students. 

Yaroslav Kuzminov

'The approach we have developed clearly highlights a key issue with AI today: you never know where to expect an error to occur. A model can make mistakes even in the simplest tasks, which are considered the core of an academic discipline. Our test highlights key issues both in the area of knowledge and in the application of that knowledge, thereby paving the way to address these core challenges. Addressing these issues is crucial if we want to rely on such models as assistants for teachers, and even more so for students. An assistant that requires everything to be rechecked—which is currently the case—is unlikely to inspire a desire to use it,' according to Yaroslav Kuzminov, Academic Supervisor of HSE University. 

Among the potential scenarios for AI use in education, scientists worldwide cite assisting teachers in creating educational materials, automating the assessment of student responses, developing adaptive curricula, and quickly generating analytics on student academic performance. According to the authors, AI can be a powerful tool for teachers, especially in the face of increasing workloads. However, there is still a need to improve the models and approaches used for their training and evaluation.

Taras Pashchenko

'The test we conducted helped us understand not only—and not so much—how to train large generative models, but also why concerns about teachers being replaced with artificial intelligence are, at the very least, premature. Indeed, it is impossible to overlook the breakthrough of generative models serving as teacher assistants: they can already attempt to develop curricula, compile reading lists for lessons, and, in some cases, grade assignments. Nevertheless, we still encounter the model's hallucinations, where it invents answers to questions when it lacks information about a phenomenon, or misunderstands the context. In general, if we want tools based on generative models to be used in pedagogical practice and earn epistemic trust, there is still much work to be done,' according to Taras Pashchenko, Head of the HSE Laboratory for Curriculum Design, who shares his perspective on the test results. 

In the future, the research team plans to continue finalising the benchmark by incorporating more complex tasks that can assess AI abilities such as information analysis and evaluation. 

Ekaterina Kruchinskaya

'Our upcoming papers will focus on both introducing new types of benchmarks and discussing academic techniques. Such techniques will be developed to further train models and mitigate the risks of hallucinations, loss of context, and errors in core knowledge. The main goal we aim to achieve is to ensure models are stable in their knowledge and to develop methods for testing this stability with even greater accuracy. Otherwise, they will remain merely tools that facilitate copying and imitation of knowledge,' notes Ekaterina Kruchinskaya, Senior Lecturer at the HSE Department of Higher Mathematics

See also:

Human Intuition Proves Stronger than Algorithms: Game Theory Tournament Held at HSE University in Perm

Researchers from the International Laboratory of Intangible-driven Economy (Perm) and the HSE Laboratory of Sports Studies, together with mathematician and science populariser Alexey Savvateev, organised a game theory tournament entitled ‘The Election Race.’ Participants competed both against one another and against artificial intelligence. For now, humans have managed to gain the upper hand and propose more effective strategies.

Educational Programmes on Robotics and Neural Network Technologies Launch at HSE University’s Faculty of Computer Science

Every year, in response to IT industry demands, the Higher School of Economics Faculty of Computer Science launches new educational programmes while updating existing ones. In 2026, the faculty introduced Bachelor’s and Master’s degree programmes in robotics for the first time.

‘Policymakers Should Prioritise Investing in AI for Climate Adaptation’

Michael Appiah, from Ghana, is a Postdoctoral Fellow at the International Laboratory of Intangible-Driven Economy (IDLab) at HSE University–Perm. He recently spoke at the seminar ‘Artificial Intelligence, Digitalization, and Climate Vulnerability: Evidence from Heterogeneous Panel Models’ about his research on ‘the interplay between artificial intelligence, digitalisation, and climate vulnerability.’ Michael told the HSE News Service about the academic journey that led him to HSE University, his early impressions of Perm, and how AI can be utilised to combat climate change.

AI Overestimates How Smart People Are, According to HSE Economists

Scientists at HSE University have found that current AI models, including ChatGPT and Claude, tend to overestimate the rationality of their human opponents—whether first-year undergraduate students or experienced scientists—in strategic thinking games, such as the Keynesian beauty contest. While these models attempt to predict human behaviour, they often end up playing 'too smart' and losing because they assume a higher level of logic in people than is actually present. The study has been published in the Journal of Economic Behavior & Organization.

HSE Scientists Develop DeepGQ: AI-based 'Google Maps' for G-Quadruplexes

Researchers at the HSE AI Research Centre have developed an AI model that opens up new possibilities for the diagnosis and treatment of serious diseases, including brain cancer and neurodegenerative disorders. Using artificial intelligence, the team studied G-quadruplexes—structures that play a crucial role in cellular function and in the development of organs and tissues. The findings have been published in Scientific Reports.

HSE Strategic Technological Projects in 2025

In 2025, HSE University continued its participation in the Priority 2030 Strategic Academic Leadership Programme, maintaining a strong focus on technological leadership in line with the programme’s updated framework. A key element of the university’s technological leadership strategy is its Strategic Technological Projects (STPs), aimed at creating in-demand, knowledge-intensive products and services.

School Students Master Communication with GigaChat at HSE and Sber Hackathon

In late December 2025, a unique competition was held at HSE University where participants solved challenges not by writing code, but solely by interacting with Sber’s GigaChat artificial intelligence model. The Improm(p)tu hackathon was an experiment less about programming skills than a new form of literacy: the ability to work effectively with AI by translating complex problems into a language neural networks can understand.

Artificial Intelligence Transforms Employment in Russian Companies

Russian enterprises rank among the world’s top ten leaders in AI adoption. In 2023, nearly one-third of domestic companies reported using artificial intelligence. According to a new study by Larisa Smirnykh, Professor at the HSE Faculty of Economic Sciences, the impact of digitalisation on employment is uneven: while the introduction of AI in small and large enterprises led to a reduction in the number of employees, in medium-sized companies, on the contrary, it contributed to job growth. The article has been published in Voprosy Ekonomiki.

HSE Seeks New Ideas for AI Agents: Initiative Competition Launched

HSE University is inviting researchers and lecturers to present concepts for new digital products based on artificial intelligence. The best projects will receive expert and technological support. Applications are open until December 19, 2025.

Final of International Yandex–HSE Olympiad in AI and Data Analysis Held at HSE University

Yandex Education and the HSE Faculty of Computer Science have announced the results of the international AIDAO (Artificial Intelligence and Data Analysis Olympiad) competition. Students from 14 countries took part. For the second year in a row, first place went to the team AI Capybara, which developed the most accurate AI model for an autonomous vehicle vision system.