Introduction to Scientific Computing
- The course aims: • to further the students’ programming skills; • to provide them with the necessary skills to write programs for experiments and corpus studies; • to teach them how to re-format data; • to teach them how to retrieve data from the Internet; • to teach the students how to write their code so that it is readable by other linguists; • to teach them how to present their research that involves coding in the written and in the oral form; • to provide an overview of some of the most exciting current computational projects; • to teach the students how to read and to assess critically linguistic research that uses computational methods; • to teach them how to formulate linguistic questions in a way that can be addressed computationally; • to teach them to conduct independent computational studies.
- Students are able to to write programs (code) for experiments and corpus studies.
- Students are able to conduct independent natural language processing studies.
- Students are able to formulate linguistic questions in a way that can be addressed computationally.
- Students are able to read and to assess critically linguistic research that uses computational methods.
- Students are able to present their research that involves coding in the written and in the oral form.
- Students are able to write their code so that it is readable by other linguists and programmers
- Students are able to retrieve data from the Internet.
- Students are able to re-format data.
- Computer architecture I.Ideas of electronics, physical components of the computer. Software, and software layering in modern multi-user computers (kernel, system, user software). Contrast against unikernel (less layering) and against highly virtualised (more layering) systems. Networking of various reliability/bandwidth/latency levels.
- Data representation I.Text encodings, various unicodes. Numerical representations. Simple data structures: lists, trees, graphs. Elements of graph theory. Cryptographic ideas: hashing, symmetric and asymmetric ciphers.
- Interfaces I.Command-line and text interfaces. Historical perspective. Teletypes, shells, Read-Evaluate-Print Loops. Basic *nix-style conventions for interaction. Basics of shell scripting. Processes and job control. Quick intro to SSH, text editors, GnuPG, Git.
- Collaboration I.Communications channels from latency, flexibility, control, trust perspectives. Version control. Quality control: bug reporting, continuous integration, documentation.
- Licensing.Clarity, conformity. Interactions between business, academia, government, and community. Copy-left, open, proprietary.
- Software design patterns.Monolithic vs modular vs microservice design. Examples in kernel, system, and user.
- Software creation I.Compilers and assembly. Libraries and portability, linking and build. Interpreters.
- Software packaging I.User-local / system-global installation. Managed vs unmanaged installs. Package management: distributions, languages, user. Image-based: VMs, containers.
- Collaboration II.Patch review. Rebasing, merging, reverting. Filters and hooks.
- Computer architecture II.Heterogeneous computing, cluster computing, distributed computing. Hard vs easy parallelism.
- Software creation II.JIT compilation. Cross-compile and heterogeneous computing. Optimisation.
- Data representation/Interfaces II.Distributed data structures. Databases. Schedulers and cluster management.
- Graphical and web interfaces.Accessibility, scriptability, portability.
- homework 1Students are required to submit two homework assignments; they are given a week to complete each assignment.
- homework 2
- In-class presentation
- Interim assessment (4 module)0.3 * exam + 0.2 * homework 1 + 0.25 * homework 2 + 0.25 * In-class presentation
- Perkins, J. Python Text Processing with NLTK 2.0 Cookbook: Use Python NLTK Suite of Libraries to Maximize Your Natural Language Processing Capabilities [Электронный ресурс] / Jacob Perkins; DB ebrary. – Birmingham: Packt Publishing Ltd, 2010. – 336 p.
- Sarkar, D. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data [Электронный ресурс] / Dipanjan Sarkar; БД Books 24x7. – Chicago: Apress, 2016. – 412 p. – ISBN 978-1-4842-2387-1