• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author

‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author

© iStock

Today, text bots like ChatGPT are doing many tasks that were originally human work. In our place, they can rewrite ‘War and Peace’ in a Shakespearean style, write a thesis on Ancient Mesopotamia, or create a Valentine’s Day card. But is there any way to identify an AI-generated text and distinguish it from works done by a human being? Can we catch out a robot? The Deputy Head of the HSE School of Data Analysis and Artificial Intelligence, Professor of the HSE Faculty of Computer Science Vasilii Gromov explained the answer in his lecture ‘Catch out a Bot, or the Large-Scale Structure of Natural Intelligence’ for Znanie intellectual society.

‘Why are modern texts created and who writes them?’ asked Vasilii Gromov. His generation and the generation of lecture listeners grew up on works written by people for people: authors of such texts put a certain meaning into their works, had a certain goal, whether the book was ‘Sleeping Beauty,’ ‘War and Peace,’ or a textbook of mathematical analysis, the professor notes. However, nowadays, children from a very early age are surrounded by texts written by an unknown author with an unclear purpose for an undefined audience. Vasilii Gromov and his colleagues wondered whether such a child would grow up the same way the previous generations have done.

The ongoing change is neither good nor bad, because the world is transforming. Humankind is now experiencing the process of ‘co-evolution of artificial intelligence and humans.’ Along with its rapid development, AI is adapting to humans, but humans also are beginning to adapt to artificial intelligence as well. To secure our future, or at least for ‘basic information hygiene,’ we need to learn to distinguish texts generated by bots (artificial intelligence systems that generate texts in natural languages like Russian, Chinese, etc) from those written by people.

Using a number of existing generated texts, it would not be difficult to identify whether a new text was written by a specific bot or a human: we simply need to load a large number of similarly generated texts into the neural network—and there you go, mission accomplished. However, after this, no-one would continue using that particular bot, and it would simply be replaced by another artificial intelligence. Therefore, scientists need to develop a mechanism capable of distinguishing any bot from any human. To do this, we need to look at the structure of language itself, which brings us to research, explaining natural languages from a mathematical point of view. Now, let’s take a look at the necessary steps.

The scientific field of natural language processing works, in particular, with the representation of words and sequences of words (n-grams, where n is the number of words) in the form of vectors (several elements of a certain number in a row), which creates a certain vector space.

Working with the representation of individual words reveals that the vocabulary of bots is no different from the vocabulary of an ordinary person. However, as soon as it comes to a sequence of two or three words, it turns out that the sequence generated by bots is significantly more predictable and much poorer in linguistic terms than the one that even the most poorly educated person can create (for example, a bot is more likely to repeat patterns). The difference between the n-gram sequence of bots and people is statistically significant even for large bots (ChatGPT), and this is what helps catch them.

Further study of natural language from a mathematical point of view brings scholars to some judgments on the location of such word vectors in space. There are regions of vector space (especially when it comes to the sequences of words) that only bots visit, and others that only people visit. Most (90–95%) are used by both, but there are separate bot areas—which is another way to catch them out.

If we cluster (a mathematical operation when sets of similar elements can be combined into one group—a cluster) a sequence of bots, these sequences turn out to be more rigid, compact, and without any discrepancies. When a verbal sequence of people of different genders and ages, with different education and backgrounds is clustered, the result is more blurry, indistinct clusters. Humans think significantly less clearly than bots, and this is another way to catch them.

If we represent each word or each n-gram as a vector, then their entire collection can be represented as a geometric object or a certain surface in a multidimensional space. Then, for example, if we take all possible word sequences in Russian, we may find that they do not fill the entire semantic space, but only part of it. Scientists can study and measure this sequence as a surface, even compare it with other surfaces (for example, with the surface of the English language). So, every surface in space has a dimension, ie, the number of independent parameters necessary to describe this object (for points on a sphere, for example, these are two values—longitude and latitude).

Studying the dimension of natural language, Vasilii Gromov expected to find an infinite value, but in the end, analysts came to the conclusion that language has a 9–10-digit dimension, and this figure varies slightly from language to language, but what is certain: human language lies in larger space dimensions than the bot's language.

Finally, the results of a recent 2023 study showed that this surface has ‘holes’ in it, like Swiss cheese. The holes are those areas of semantic space that our language has not yet reached. Although at the moment analysts cannot clearly indicate what is hidden behind them, they can detect them. Different languages have different holes, also referred to as ‘blind spots.’ When catching bots, it is important to remember that people are drawn to the boundaries of such holes, because they use language to create new meanings and ideas. Meanwhile, bots, like learned programs, move away from these holes, which makes the task of catching them easier for now. Surprisingly, it is humour that most often appears at the boundaries of such holes.

‘Bots are simply imitators, not artists. Technology does not stand still, so we must try to solve this “bot-catching” problem and understand what a language is from a mathematical point of view,’ summarised Vasilii Gromov.

See also:

HSE University to Reward Students Who Write Their Thesis Using AI

HSE University has launched a competition for solutions using artificial intelligence technology in theses work. The goal of the competition is to evaluate how students use tools based on generative models in their 2024 graduation theses (GT).

Production of the Future: AI Research Centre Presents Its Developments in Manual Operations Control Systems

Researchers from the HSE AI Research Centre have built a system for the automated control of manual operations, which finds application in industrial production. The system facilitates the process of monitoring objects and actions, as well as controlling the quality of their execution.

HSE and Yandex to Expand Collaboration in Training AI Specialists

Over the next ten years, the partnership between Yandex and the HSE Faculty of Computer Science (FCS) will broaden across three key areas: launching new educational programmes, advancing AI research, and exploring the application of generative neural networks in the educational process. Established by HSE University and Yandex a decade ago, the Faculty of Computer Science has since emerged as a frontrunner in training developers and experts in AI and machine learning, with a total of 3,385 graduates from the faculty over this period.

‘The Goal of the Spring into ML School Is to Unite Young Scientists Engaged in Mathematics of AI’

The AI and Digital Science Institute at the HSE Faculty of Computer Science and Innopolis University organised a week-long programme for students, doctoral students, and young scientists on the application of mathematics in machine learning and artificial intelligence. Fifty participants of Spring into ML attended 24 lectures on machine learning, took part in specific pitch sessions, and completed two mini-courses on diffusion models—a developing area of AI for data generation.

Researchers ‘Personalise’ the Selection of a Neural Network for Face Recognition on Smartphones

Researchers from HSE University in Nizhny Novgorod, MISIS and the Artificial Intelligence Research Institute (AIRI) have developed an algorithm that selects the best available neural network for facial recognition, taking into account the features of a mobile device. This new approach accelerates the selection of the most suitable neural network and allows the identification of people with an accuracy rate of up to 99%. The study was published in the IEEE Access journal. The source code is available on GitHub.

Neural Network Developed at HSE Campus in Perm Will Determine Root Cause of Stroke in Patients

Specialists at HSE Campus in Perm and clinicians at Perm City Clinical Hospital No. 4, have been collaborating to develop a neural network capable of determining the root cause of a stroke. This marks the world's first attempt to create such a system, the developers note.

AI Assists with Fact-Checking: HSE Scientists Streamline Information Verification

Specialists at the HSE AI Research Centre have developed an AI-powered fact-checking assistant. This software solution will improve the quality of working with information, reduce the risks of errors and biases, and save both time and resources. A notable advantage of the program lies in its capability to process a wide variety of statement types.

HSE University and Neimark IT Campus Sign an Agreement on Launching an AI Network Programme

HSE University, together with the world-class Neimark IT campus, is preparing a unique professional environment for future IT specialists: to this end, an IT school will be created in the Nizhny Novgorod region, and on September 1st, the first network degree programme ‘Artificial and Augmented Intelligence Technologies’ will be launched at HSE University in Nizhny Novgorod.

Russian Scientists Develop AI Algorithm for Faster Prediction of Earthquakes and Disease Outbreaks

Researchers at the HSE University AI Research Centre and Faculty of Computer Science have proposed a novel algorithm for detecting structural changes in time series. The method uses a neural network to compare various segments of a series, enabling rapid detection of changes in its behaviour. The results of their work have been presented at the 26th International Conference on Artificial Intelligence and Statistics— AISTATS (A*).

HSE and Indian Institute of Technology Delhi Agree on Joint Research Projects

HSE University-St Petersburg and the Indian Institute of Technology Delhi (IIT Delhi), a leading Indian university, have agreed to launch joint research projects in the field of social, political studies, humanities, and data analysis for master's students. On the Russian side, this work will be coordinated by the HSE St Petersburg School of Social Sciences.