Компьютерное зрение

Бакалавриат 2025/2026

Статус: Курс по выбору (Прикладной анализ данных)

Кто читает: Департамент больших данных и информационного поиска

Где читается: Факультет компьютерных наук

Когда читается: 4-й курс, 3 модуль

Охват аудитории: для своего кампуса

Преподаватели: Копылов Иван Станиславович

Язык: английский

Кредиты: 4

Контактные часы: 40

Full Syllabus Ask Question

Abstract

This course provides a comprehensive, engineering-oriented introduction to modern computer vision. Students will learn the complete pipeline from classical image processing to state-of-the-art transformer-based architectures. The course covers CNNs, Vision Transformers, object detection (YOLO, DETR), segmentation (U-Net, Mask R-CNN, SAM), multimodal models (CLIP), self-supervised learning, generative models (VAE, GAN), and video understanding. Practical sessions include hands-on implementation, training, and deployment of CV models.

Learning Objectives

The main purpose is to provide students with both theoretical understanding and practical skills in modern computer vision. Students will master the evolution from classical filters to deep learning architectures, understand the mathematical foundations of CNNs and Transformers, and gain engineering competence in deploying CV systems.

Expected Learning Outcomes

Implement and train CNN and Vision Transformer architectures
Build object detection and segmentation pipelines using modern frameworks
Apply multimodal models (CLIP) for zero-shot tasks
Understand and implement self-supervised learning methods
Design and train generative models (VAE, GAN)
Process video data with spatio-temporal models
Deploy CV models with ONNX/TensorRT for production
Evaluate and debug CV systems systematically

Course Contents

Image as a Signal and Spatial Structure
Classification: CNNs and Vision Transformers
Object Detection: Anchor-Based vs Transformer-Based
Segmentation: U-Net, Mask R-CNN, Mask2Former, SAM
Multimodal Vision and Open-Vocabulary Models
Representation Learning and Self-Supervision
Generative Models I: Autoencoders, VAE, VQ-VAE, VQ-GAN
Generative Models II: GAN Evolution
Overview of Diffusion Models + Video Representations
Production Engineering and Final Integration

Assessment Elements

HW_1
HW_2
Project
Exam

Interim Assessment

2025/2026 3rd module
0.4 * Exam + 0.1 * HW_1 + 0.1 * HW_2 + 0.4 * Project

Bibliography

Recommended Core Bibliography

Computer vision : models, learning, and inference, Prince, S. J. D., 2012
Huang, K., Hussain, A., Wang, Q.-F., & Zhang, R. (2019). Deep Learning: Fundamentals, Theory and Applications. Cham, Switzerland: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2029631
Richard Szeliski. (2010). Computer Vision: Algorithms and Applications. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.E8FCD1BD

Recommended Additional Bibliography

Deep learning, Goodfellow, I., 2016

Authors

Kopylov Ivan Stanislavovich
Кононова Елизавета Дмитриевна

Course Syllabus