• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Introduction to Scientific Computing

2020/2021
Учебный год
ENG
Обучение ведется на английском языке
4
Кредиты
Статус:
Курс обязательный
Когда читается:
1-й курс, 3, 4 модуль

Преподаватель


Хауэлл Николас Лернер

Course Syllabus

Abstract

The course is designed to further the students’ knowledge of natural language processing and to polish their programming skills. The course aims to provide the students with the programming and natural language processing knowledge and competencies necessary to plan and conduct research projects of their own leading to the M.Sc. dissertation and scientific publications.
Learning Objectives

Learning Objectives

  • The course aims: • to further the students’ programming skills; • to provide them with the necessary skills to write programs for experiments and corpus studies; • to teach them how to re-format data; • to teach them how to retrieve data from the Internet; • to teach the students how to write their code so that it is readable by other linguists; • to teach them how to present their research that involves coding in the written and in the oral form; • to provide an overview of some of the most exciting current computational projects; • to teach the students how to read and to assess critically linguistic research that uses computational methods; • to teach them how to formulate linguistic questions in a way that can be addressed computationally; • to teach them to conduct independent computational studies.
Expected Learning Outcomes

Expected Learning Outcomes

  • Students are able to to write programs (code) for experiments and corpus studies.
  • Students are able to conduct independent natural language processing studies.
  • Students are able to formulate linguistic questions in a way that can be addressed computationally.
  • Students are able to read and to assess critically linguistic research that uses computational methods.
  • Students are able to present their research that involves coding in the written and in the oral form.
  • Students are able to write their code so that it is readable by other linguists and programmers
  • Students are able to retrieve data from the Internet.
  • Students are able to re-format data.
Course Contents

Course Contents

  • Computer architecture I.
    Ideas of electronics, physical components of the computer. Software, and software layering in modern multi-user computers (kernel, system, user software). Contrast against unikernel (less layering) and against highly virtualised (more layering) systems. Networking of various reliability/bandwidth/latency levels.
  • Data representation I.
    Text encodings, various unicodes. Numerical representations. Simple data structures: lists, trees, graphs. Elements of graph theory. Cryptographic ideas: hashing, symmetric and asymmetric ciphers.
  • Interfaces I.
    Command-line and text interfaces. Historical perspective. Teletypes, shells, Read-Evaluate-Print Loops. Basic *nix-style conventions for interaction. Basics of shell scripting. Processes and job control. Quick intro to SSH, text editors, GnuPG, Git.
  • Collaboration I.
    Communications channels from latency, flexibility, control, trust perspectives. Version control. Quality control: bug reporting, continuous integration, documentation.
  • Licensing.
    Clarity, conformity. Interactions between business, academia, government, and community. Copy-left, open, proprietary.
  • Software design patterns.
    Monolithic vs modular vs microservice design. Examples in kernel, system, and user.
  • Software creation I.
    Compilers and assembly. Libraries and portability, linking and build. Interpreters.
  • Software packaging I.
    User-local / system-global installation. Managed vs unmanaged installs. Package management: distributions, languages, user. Image-based: VMs, containers.
  • Collaboration II.
    Patch review. Rebasing, merging, reverting. Filters and hooks.
  • Computer architecture II.
    Heterogeneous computing, cluster computing, distributed computing. Hard vs easy parallelism.
  • Software creation II.
    JIT compilation. Cross-compile and heterogeneous computing. Optimisation.
  • Data representation/Interfaces II.
    Distributed data structures. Databases. Schedulers and cluster management.
  • Graphical and web interfaces.
    Accessibility, scriptability, portability.
Assessment Elements

Assessment Elements

  • non-blocking homework 1
    Students are required to submit two homework assignments; they are given a week to complete each assignment.
  • non-blocking homework 2
  • non-blocking In-class presentation
  • non-blocking exam
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.3 * exam + 0.2 * homework 1 + 0.25 * homework 2 + 0.25 * In-class presentation
Bibliography

Bibliography

Recommended Core Bibliography

  • Perkins, J. Python Text Processing with NLTK 2.0 Cookbook: Use Python NLTK Suite of Libraries to Maximize Your Natural Language Processing Capabilities [Электронный ресурс] / Jacob Perkins; DB ebrary. – Birmingham: Packt Publishing Ltd, 2010. – 336 p.

Recommended Additional Bibliography

  • Sarkar, D. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data [Электронный ресурс] / Dipanjan Sarkar; БД Books 24x7. – Chicago: Apress, 2016. – 412 p. – ISBN 978-1-4842-2387-1