• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
2021/2022

SQL и автоматизация текущей отчетности

Статус: Дисциплина общефакультетского пула
Когда читается: 1, 2 модуль
Охват аудитории: для своего кампуса
Преподаватели: Шершуков Данила Вадимович
Язык: английский
Кредиты: 3

Course Syllabus

Abstract

Students are expected to be familiar with:•basic set operations (union, intersection, subtraction, cartesian product),•descriptive statistics,•Python (recommended) or ExcelDuring enrollment, a test may be held to check that students are familiar with basic set operations and basic descriptive statistics. The course develops practical and theoretical skills for using SQL in analytics and automating reports using BI systems. The course covers using SQL for extracting and analysing data and using SQL and Redash for report automation. The course consists of 7 classes and an exam. Each class consits of a lecture and a seminar. SQL is an important skill for analysts who work with large volumes of data and thus a competetive advantage for students during job search and at work. This course is designed to provide a lot of practice for students in writing queries, so that they could use SQL fluently. In this course, students will work with two SQL dialects: PostgreSQL and ClickHouse. PostgreSQL is an all-purpose DBMS, well-suited both for production (including highload) and analytics. ClickHouse is column-oriented and targets online analytical processing (OLAP). It excels at working with huge volumes of denormalized data, at the cost of lower flexibility and fewer features supported. While PostgreSQL is a perfect dialect to start learning SQL due to its systematic design, wide range of features and excelent documentation, ClickHouse is a good dialect to become aquainted with column-oriented databases and their use in data marts. The way reporting automation is done in a company greatly depends on the available infrastructure. The best case scenario is when a BI system is already deployed and analysts can focus on making reports. Without a BI system, creating infrastructure for running and viewing reports is required. It can be done in many ways, from Excel workbooks to creating simple BI systems with Python. In this course, we will assume a BI system (Redash) is available. Students who opt in for a challenge, will be provided with materials on doing similar reports and some basic infrastructure in Python.
Learning Objectives

Learning Objectives

  • To teach students how to use SQL for doing analytical tasks.
  • To improve students' skills in report automation
Expected Learning Outcomes

Expected Learning Outcomes

  • use a database client with a GUI
  • become fluent in SELECT SQL queries
  • use basic data definition and data manipulation SQL queries
  • apply SQL to solve real-life analytical problems
  • improve report automation skills by learning to automate reports in Excel or in Python, depending on their starting programming experience
Course Contents

Course Contents

  • Introduction to databases and SQL
    Databases and servers. Overview of contemporary SQL servers. Connecting to a database using DBeaver. Relational model. SQL overview. Basic data types and functions that work with them. Type casting. Using the query editor in DBeaver. Structure of SELECT queries with more details ontable expressions and select lists. Connecting to databases from Excel (ODBC driver) and Python (psycopg2).
  • Table expressions in detail
    Joins, set operations and subqueries. Using subqueries in table references, conditions and common table expressions. Set-returning functions. Query planner, EXPLAIN ANALYZE, overview of algorythms*. Visualizing query results (Excel or Python).
  • Select list and domain-specific functions
    CASEs. Regular expressions: complex matches and data extraction. Dates and time: working with timezones, common mistakes when working with dates. Advanced logical functions. Logging in Python.
  • Window functions
    Using aggregate functions as window functions. Lag, rank. Making window functions cumulative. Jinja2 (Python).
  • Complex computing and ETLs
    Common mistakes when computing statistics in SQL and how to avoid them. Handling multistage computations while keeping them comprehensible. Working with event chains: restoring snapshots, computing statistics. DDL and DML. Making ETL processes in SQL and Python.
  • Working with geospatial data
    Postgres extensions. Data types and functions for working with geospatial data.
  • SQL in interviews
    Common techniques used in test assignment. Revision.
Assessment Elements

Assessment Elements

  • non-blocking home assignment 1
  • non-blocking exam
  • non-blocking home assignment 2
  • non-blocking home assignment 3
  • non-blocking home assignment 4
  • non-blocking home assignment 5
  • non-blocking home assignment 6
  • non-blocking Project
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.3 * exam + 0.066 * home assignment 1 + 0.066 * home assignment 2 + 0.067 * home assignment 3 + 0.067 * home assignment 4 + 0.067 * home assignment 5 + 0.067 * home assignment 6 + 0.3 * Project
Bibliography

Bibliography

Recommended Core Bibliography

  • Beaulieu, A. (2009). Learning SQL : Master SQL Fundamentals: Vol. 2nd ed. O’Reilly Media.
  • Beighley, L. (2007). Head First SQL : Your Brain on SQL —— A Learner’s Guide. Reilly - O’Reilly Media.
  • Malik, U., Goldwasser, M., & Johnston, B. (2019). SQL for Data Analytics : Perform Fast and Efficient Data Analysis with the Power of SQL. Packt Publishing.

Recommended Additional Bibliography

  • Linoff, G. (2016). Data Analysis Using SQL and Excel: Vol. Second edition. Wiley.