• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2023/2024

Введение в интеллектуальный анализ текста с помощью R

Направление: 45.03.03. Фундаментальная и прикладная лингвистика
Когда читается: 4-й курс, 3 модуль
Формат изучения: с онлайн-курсом
Онлайн-часы: 20
Охват аудитории: для всех кампусов НИУ ВШЭ
Язык: английский
Кредиты: 3
Контактные часы: 6

Course Syllabus

Abstract

In this online course, you will learn about the next big thing in applied analytics – text analysis. This course is self-contained: you will learn everything from basic programming skills to advanced natural language modelling for topic discovery. This course is designed around a problem-oriented approach, meaning that we will not spend too much time learning theoretical concepts but instead focus on applying them to practical problems. a. The goal of this online course is to equip students with the necessary knowledge and skills for analysing text data with R programming language. b. We do not assume any specific prerequisites for this course. However, some knowledge of natural language processing or R programming might ease the dive into the course materials. c. Each week on the course is accompanied by tests, gradable and non-gradable programming assignments, and links to additional material for those who want to dig deeper into the course material. At the end of the course, you’ll have to complete a project and then review your peers' projects. d. R (programming language), RStudio e. This course is heavily tilted toward practical skills. During this course, students will dive into the basics of R for text analysis, tidy text approach, regular expressions, different algorithms for topic modelling and text classification with machine learning and deep learning approaches, and many more. Various synthetic and real-world databases will help participants see how to apply these techniques to extract insights from user reviews, social media posts, short descriptions of the products. This distance learning opportunity is brought to you by HSE University, one of the top think tanks in Russia, by instructors experienced in using text analysis for business-oriented projects. The online course consists on short pre-recorded lectures, 5 to 15 minutes in length. Each week will have a graded test with 10 to 15 questions. At the end of the last week, students will have to complete a project utilising the skills learned in the course, and then review and grade the projects of their peers. The course gives students an opportunity to learn the methods on natural language processing (NLP) and then apply these methods to problems in students’ own areas of interest.
Learning Objectives

Learning Objectives

  • The goal of this online course is to equip students with the necessary knowledge and skills for analysing text data with R programming language.
Expected Learning Outcomes

Expected Learning Outcomes

  • student has the necessary knowledge and skills for analysing text data with R programming language
  • student is familiar with the basics of R for text analysis, tidy text approach, regular expressions, different algorithms for topic modelling and text classification with machine learning and deep learning approaches
Course Contents

Course Contents

  • R and RStudio Basics
  • Working with Tidyverse
  • Supervised machine learning with the bag-of-words approach
  • Unsupervised machine learning
Assessment Elements

Assessment Elements

  • non-blocking Test
    Each week on the course is accompanied by tests, gradable and non-gradable programming assignments
  • non-blocking Final Project
    You will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.
Interim Assessment

Interim Assessment

  • 2023/2024 3rd module
    The final grade is the grade for the online course.
Bibliography

Bibliography

Recommended Core Bibliography

  • Derryberry, D. R. (2014). Basic Data Analysis for Time Series with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=817454

Recommended Additional Bibliography

  • Bivand, R., Pebesma, E. J., & Gómez-Rubio, V. (2013). Applied Spatial Data Analysis with R (Vol. 2nd ed). New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=601853