• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Programming for Urban Analytics

2020/2021
Academic Year
ENG
Instruction in English
7
ECTS credits
Delivered at:
Vysokovsky Graduate School of Urbanism
Course type:
Elective course
When:
2 year, 1, 2 module

Instructors


Kotov, Egor

Course Syllabus

Abstract

Contemporary urban planner and researcher should be aware of the processes that can be observed with new data sources and analysis tools. In the modern urbanised world, enormous amounts of data are generated daily ranging from citizen complaints and reports to their search queries, daily movements, electricity meter readings, etc. Analysing that data creates new opportunities for studying urban phenomena and enables new scientific approaches in urban planning and management. The extraordinary volume and multidimensionality of urban data require learning new tools and methods for collecting and acquiring such data, shaping it into a specific form appropriate for the analysis, and performing the analysis. The course introduces the students to the types of data (especially spatial data) relevant to urban research, the advanced tools of working with such data, the full process of data analysis from data collection and exploratory visualisation to inferences, conclusions, presentation of the analysis results. Specific topics include data acquisition, data manipulation and preparation, exploratory analysis, statistical analysis (basic regression and introduction to spatial autocorrelation and regression), data visualisation and reproducible reporting. The students will use R statistical programming language and RStudio IDE (integrated development environment) during the course, but the concepts used in the course and the acquired skills can be applied in Python, Julia or any other programming language with data analysis libraries.
Learning Objectives

Learning Objectives

  • Familiarise students with different types of urban data sources, file and database types used for storage of such data.
  • Discuss the origins and associated limitations of various urban data sources.
  • Showcase the practices of explanatory data visualisation in urban planning and research.
  • Explain the importance of time and space dimensions of urban data.
  • Explain how the data is stored and structured.
  • Develop basic skills of applying statistical analysis to large and small data sets.
  • Teach basic principles of exploratory data analysis.
  • Show how to communicate urban data analysis results through explanatory data visualisation.
Expected Learning Outcomes

Expected Learning Outcomes

  • Acquire spatial urban data from files, remote servers and databases using R packages, API and web-scraping
  • Write readable and error-free data analysis code in R that allows a third party to reproduce and interpret the analysis.
  • Apply exploratory data analysis (EDA) to reveal time and space variations and patterns in urban data
  • Clean and Transform spatial urban data to prepare it for exploratory and statistical analysis.
  • Apply linear and spatial regression models to interpret space-time variations and patterns of urban processes.
  • Perform geoprocessing and spatial data manipulation and visualisation. Apply linear and spatial regression models to interpret space-time variations and patterns of urban processes.
Course Contents

Course Contents

  • Introduction to Smart Cities and Urban Data
    - Smart city as a concept, as a hype, as a marketing phenomenon, as one of the key causes of emer-gence of urban data. City as a corporation vs. city as a living organism. Adaptation of the city to new technologies. - Automated data generation and collection. Urban data ubiquity. The origins of urban data. Urban data sources. Traditional urban data (state urban statistics) vs. new data sources. - Urban data analysis as part of daily routines of urban dwellers, geo-marketing specialists and tech companies. Outcomes of data ubiquity for urban researchers, planners and managers. - Required skill sets for urban data analysist.
  • Introduction to Scripted Data Analysis and Reproducible Research
    - Introduction to scripted data analysis. Point-and-click analysis vs. scripted analysis: head-to-head comparison. Using GUI (graphical user interface) dialog windows vs. calling functions. Importance of reproducible research with motivating examples. - R language as a statistical command line analysis tool. R language as a programming language. Why R. Comparison of R, Python, Julia, and a few other tools. - Basics of RStudio IDE (Integrated Development Environment). Working with RStudio projects. - Reproducible research using R, R Markdown, R Markdown Notebooks, flexdashboard. - Basic plotting in R. Basic functions and routines applied to classic datasets (mtcars, cars, iris, etc.). Basic data import.
  • Data Visualisation and Exploratory Data Analysis
    - Storytelling with data. Exploratory vs explanatory analysis. Choosing effective visuals for explanato-ry analysis. Gestalt principles of visual perception. Spotting bad graphs and maps. - Exploratory data analysis (EDA) process and tools. Plots vs summary statistics. - Rorschach protocol. Line-up protocol. - R tools for Exploratory data analysis. Advanced plotting using ggplot2 and associated tools. - Interactive plots in R, the simple way. - Plot design layer by layer. Plot customisation according to Gestalt principles of visual perception. Plot optimisation for colour blind accessibility.
  • Urban Data Types and Sources. Getting Access to Data
    - Data sources. Open data. Code books. Means of accessing the data. Working with multiple data sources. Data storage file formats. Databases. Getting data from databases. Intro to getting data from web sources using APIs. - Basic types of data, operators, commands, functions. Approaches to working with data using R. Basic data structures. Objects. - R object types: vectors, matrices, “data.frames”, “tibbles” and “data.tables”. Lists. Differences be-tween object types and use cases. - Exporting data to various formats. Choice of storage file format depending on storage goals. - Basic data manipulation using “data.table” and “dplyr”.
  • Tidy Data. Data Cleaning and Transformation
    - Wide vs long data. Data reshaping and manipulation. Shaping data in analysis-appropriate form. - Tidy data concept. Data cleaning. String and date manipulation. - Regular expressions and their applications for data cleaning. Common pitfalls of regular expressions. - Feature creation. Data type conversion. - Building algorithms for data processing. - Creating functions custom functions, conditional statements, loops for data processing and visualisa-tion.
  • Statistical Modelling
    - Correlation. Simple linear regression. Model fit and interpretation. - Multiple regression. Simple feature selection. Parallel slopes models. - A unified framework for application of statistical models in R. Visualization of model results and per-formance.
  • Spatial Data Analysis and Statistics
    - Basics of working with spatial data in R. Spatial data storage formats and object types. Importing spa-tial data from various sources. - Visualising spatial data in R. Static plotting of spatial data. Interactive maps. - Merging and joining spatial data. Spatial data analysis. Geometric operations. - Introduction to spatial statistics. Spatial autocorrelation. Spatial segregation. Spatial generalised linear models.
  • Working with APIs and Web Scraping
    - Advanced work with APIs. Reading API documentation. - Constructing API requests. Processing API responses. Data manipulation for converting API re-sponses into analysis-appropriate form. - Building algorithms for automated data retrieval using APIs. - Web scraping and related copyright and ethical issues. - Simple web scraping techniques. Reshaping of scraped data into analysis-appropriate form.
Assessment Elements

Assessment Elements

  • non-blocking Lab 1 - Introduction to Scripted Data Analysis and Reproducible Research
  • non-blocking Lab 02 - R vs Excel Basic Operations and GIS Basic Operations
  • non-blocking Lab 03 - Urban Data Types and Sources. Getting Access to Data
  • non-blocking Lab 04 - Data Vis (Basic)
  • non-blocking Lab 05 (graded) - Data Vis Advanced (+Spatial)
  • non-blocking Lab 06 - Tidy Data
  • non-blocking Lab 07 - Regression Models
  • non-blocking Lab 08 (graded) - Regression Models
  • non-blocking Lab 09 - Spatial Regression
  • non-blocking Lab 10 (graded) - Spatial Regression
  • non-blocking Exam
    The exam is carried out using proctoring (asynchronous type) / Экзамен проводится с применением прокторинга (асинхронного типа)
  • non-blocking Lab 11 - Clustering
  • non-blocking Lab 12 (graded) - Working with APIs
  • non-blocking Lab 13 - Web Scraping
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.3 * Exam + 0.175 * Lab 05 (graded) - Data Vis Advanced (+Spatial) + 0.175 * Lab 08 (graded) - Regression Models + 0.175 * Lab 10 (graded) - Spatial Regression + 0.175 * Lab 12 (graded) - Working with APIs
Bibliography

Bibliography

Recommended Core Bibliography

  • Arbia G. A Primer for Spatial Econometrics: With Applications in R. Basingstoke: Palgrave Macmillan, 2014.
  • Munzert S. Automated data collection with R: a practical guide to Web scraping and text mining. Chichester, West Sussex, United Kingdom: Wiley, 2014. 1 p.
  • Pace L., Hlynka M. Beginning R an introduction to statistical programming. New York: Apress, 2012.
  • Peng R.D., Dominici F. Statistical methods for environmental epidemiology with R: a case study in air pollution and health. New York ; London: Springer, 2008. 144 p.
  • Wickham H. ggplot2: elegant graphics for data analysis. Second edition. Cham: Springer, 2016. 260 p.

Recommended Additional Bibliography

  • Arbia G. Spatial Econometrics: Statistical Foundations and Applications to Regional Convergence. Springer Science & Business Media, 2006. 220 p.
  • Knaflic C.N. Storytelling with data: a data visualization guide for business professionals. New Jersey: Wiley, 2015.
  • Offenhuber D., Ratti C. Decoding the city: Urbanism in the age of big data. Birkhäuser, 2014.