• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2021/2022

Анализ лингвистических данных: квантитативные методы и визуализация

Лучший по критерию «Полезность курса для Вашей будущей карьеры»
Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Статус: Курс обязательный (Лингвистическая теория и описание языка)
Направление: 45.04.03. Фундаментальная и прикладная лингвистика
Когда читается: 2-й курс, 1, 2 модуль
Формат изучения: с онлайн-курсом
Онлайн-часы: 2
Охват аудитории: для своего кампуса
Преподаватели: Попова Дарья Павловна
Прогр. обучения: Лингвистическая теория и описание языка
Язык: английский
Кредиты: 3
Контактные часы: 32

Course Syllabus

Abstract

Preprocessing of linguistic data in Python is designed to further the students’ knowledge of natural language processing and to polish their programming skills. The course aims to provide the students with the programming and natural language processing knowledge and competencies necessary to plan and conduct research projects of their own leading to the M.Sc. dissertation and scientific publications.
Learning Objectives

Learning Objectives

  • Within this course you will: ● learn about the principal steps of a quantitative research in linguistics; ● learn about the possibilities and limitations of quantitative approaches as applied to different research questions; ● learn to formulate research questions and develop them into testable hypotheses; ● explore the possibilities of data collection and different approaches to sampling; ● learn to evaluate the quality of a quantitative approach; ● study the most common corpus, experimental, and mixed design of the linguistic studies and learn to evaluate research plans, discover and prevent the associated threats to data validity; ● practice in preparing your quantitative data for analysis, evaluating the quality of your data; treating missing data; ● learn about the possibilities and limitations of conventional statistical techniques and criteria, as well as some popular contemporary multivariate statistical methods; ● learn to choose and apply in practice a set of appropriate statistical tests for your research question.
  • to further the students’ programming skills
  • to provide the students with the necessary skills to write programs for experiments and corpus studies
  • to teach the students how to re-format data
  • to teach the students how to retrieve data from the Internet
  • to teach the students how to write their code so that it is readable by other linguists
  • to teach the students how to present their research that involves coding in the written and in the oral form
  • to provide an overview of some of the most exciting current computational projects
  • to teach the students how to read and to assess critically linguistic research that uses computational methods
  • to teach the students how to formulate linguistic questions in a way that can be addressed computationally
  • to teach the students to conduct independent computational studies
Expected Learning Outcomes

Expected Learning Outcomes

  • conducts independent natural language processing studies
  • formulates linguistic questions in a way that can be addressed computationally
  • is able to re-format data
  • presents their research that involves coding in the written and in the oral form
  • reads and assesses critically linguistic research that uses computational methods
  • retrieves data from the Internet
  • writes programs (code) for experiments and corpus studies
  • writes their code so that it is readable by other linguists and programmers
  • are able to account for basic types of data used in linguistic research
  • are able to apply basic quantitative methods for analysing linguistic data
  • are able to apply different techniques for presenting both qualitative and quantitative linguistic data in scholarly writing
  • are able to critically discuss the limitations of commonly used methods for answering research questions about language
  • are able to critically evaluate linguistic data presented in previous research
  • are able to reason on how to interpret linguistic results, including how to evaluate what kind of information a given method can offer and how to estimate the potential range of variables that can affect results in linguistic research
Course Contents

Course Contents

  • Datatypes and variables
  • №1
  • Control structures
  • №2
  • Input and output
  • №3.
  • Subroutines and modules
  • №4.
  • Regular expressions
  • №5.
  • Text manipulation
  • №6.
  • Internet data
  • №7.
  • Retrieving webpages, HTML, parsing HTML, webcrawlers.
  • №8.
  • Different data formats: csv, databases, json
  • №9.
  • Basics of web design: creating a web site for a linguistic experiment
  • №10.
  • Word2vec
  • №11.
  • Graphs in Python
  • №12.
Assessment Elements

Assessment Elements

  • non-blocking homeworks
    Written assignments includes theoretical tests and practical problem-solving. The assignments are published online. The assignments should be submitted via an electronic form.
  • non-blocking exam
  • non-blocking homework assignment 1
  • non-blocking homework assignment 2
  • non-blocking in-class presentation
  • blocking экзамен (final project)
Interim Assessment

Interim Assessment

  • 2020/2021 4th module
    0.6 * homeworks + 0.4 * exam
  • 2021/2022 2nd module
    0.4 * экзамен (final project) + 0.1 * in-class presentation + 0.3 * homework assignment 2 + 0.2 * homework assignment 1
Bibliography

Bibliography

Recommended Core Bibliography

  • Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632
  • Romano, F. (2015). Learning Python. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1133614
  • Wickham H. ggplot2: elegant graphics for data analysis. Second edition. Cham: Springer, 2016. 260 p.

Recommended Additional Bibliography

  • Stowell, Sarah (2014). Using R for Statistics. Apress. https://link.springer.com/book/10.1007%2F978-1-4842-0139-8

Authors

  • SCHUROV ILYA VALEREVICH
  • RYZHOVA DARYA ALEKSANDROVNA
  • POZDNYAKOV IVAN SERGEEVICH
  • POPOVA DARYA PAVLOVNA
  • DANIEL MIKHAIL ALEKSANDROVICH