Video Semantic Scenes Detection

Student: Glazkova Ekaterina

Supervisor: Stanislav N. Fedotov

Educational Programme: Data Science (Master)

Year of Graduation: 2021

The paper considers application of the Transformer architecture to solve the problem of splitting video into semantic scenes. Semantic segmentation is aimed to separate scenes based on storyline for further top-level analysis. The following problem statement is considered - short scenes (shots) taken sequentially by a single camera are extracted with existing algorithms, the method considered in this paper groups the resulting short scenes into final semantic scenes. Short scenes within a semantic scene are united by a storyline, characters, place of action and form a separate part of the narrative. The grouping problem is solved as a binary classification problem of sequence elements. For each short scene, it is predicted whether it is the end of the semantic scene. Adapting the Transformer architecture allows usage of longer video context than existing approaches do. In addition, it is possible to use pretraining on unlabeled data (similar to the BERT and VideoBERT) and utilize pretrained for other tasks weights of architectures based on the Transformer encoder. The paper presents experiments on adapting the Transformer model encoder for the task of video semantic scenes segmentation, experiments with size of the architecture, length of the history, increasing the stability of training, pretraining using masked loss function, using the weights of the existing pretrained COOT model for video analysis, adding a BNet block from the LGSS model specific to this task. The resulting model is comparable in quality to the existing methods of video segmentation into semantic scenes and surpasses methods that use only place features by Average Precision and mean Intersection over Union metrics on the open MovieNet-SSeg feature film dataset.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses