• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2019/2020

Препроцессинг и анализ лингвистических данных на языке Python

Статус: Курс обязательный (Лингвистическая теория и описание языка)
Направление: 45.04.03. Фундаментальная и прикладная лингвистика
Когда читается: 2-й курс, 1, 2 модуль
Формат изучения: с онлайн-курсом
Преподаватели: Попова Дарья Павловна
Прогр. обучения: Лингвистическая теория и описание языка
Язык: английский
Кредиты: 3

Course Syllabus

Abstract

Preprocessing of linguistic data in Python is designed to further the students’ knowledge of natural language processing and to polish their programming skills. The course aims to provide the students with the programming and natural language processing knowledge and competencies necessary to plan and conduct research projects of their own leading to the M.Sc. dissertation and scientific publications.
Learning Objectives

Learning Objectives

  • to further the students’ programming skills
  • to provide the students with the necessary skills to write programs for experiments and corpus studies
  • to teach the students how to re-format data
  • to teach the students how to retrieve data from the Internet
  • to teach the students how to write their code so that it is readable by other linguists
  • to teach the students how to present their research that involves coding in the written and in the oral form
  • to provide an overview of some of the most exciting current computational projects
  • to teach the students how to read and to assess critically linguistic research that uses computational methods
  • to teach the students how to formulate linguistic questions in a way that can be addressed computationally
  • to teach the students to conduct independent computational studies
Expected Learning Outcomes

Expected Learning Outcomes

  • is able to re-format data
  • writes programs (code) for experiments and corpus studies
  • writes their code so that it is readable by other linguists and programmers
  • presents their research that involves coding in the written and in the oral form
  • reads and assesses critically linguistic research that uses computational methods
  • retrieves data from the Internet
  • formulates linguistic questions in a way that can be addressed computationally
  • conducts independent natural language processing studies
Course Contents

Course Contents

  • Datatypes and variables
    Variables assignment, basic datatypes, mutability.
  • Control structures
    Grouping and indentation. If, for, while, break and continue.
  • Input and output
    Command-line input, keyboard input, file input and output.
  • Subroutines and modules
    Simple functions, functions that return values, functions that take arguments, recursive functions, modules, writing modules. Classes.
  • Regular expressions
    Matching, searching for patterns, patterns.
  • Text manipulation
    Tokenization, stemming, parsing different data formats.
  • Internet data
  • Retrieving webpages, HTML, parsing HTML, webcrawlers.
  • Different data formats: csv, databases, json
  • Basics of web design: creating a web site for a linguistic experiment
  • Word2vec
  • Graphs in Python
Assessment Elements

Assessment Elements

  • non-blocking homework assignment 1
  • non-blocking homework assignment 2
  • non-blocking in-class presentation
  • non-blocking экзамен (final project)
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.2 * homework assignment 1 + 0.25 * homework assignment 2 + 0.25 * in-class presentation + 0.3 * экзамен (final project)
Bibliography

Bibliography

Recommended Core Bibliography

  • Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632

Recommended Additional Bibliography

  • Romano, F. (2015). Learning Python. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1133614