• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Exploratory Data Analysis

2019/2020
Academic Year
ENG
Instruction in English
4
ECTS credits
Course type:
Elective course
When:
1 year, 1, 2 module

Instructor


Batagelj, Vladimir

Course Syllabus

Abstract

This course is dedicated to numerical and graphical techniques for summarizing and displaying data. Special attention is paid to exploration versus confirmation. Connections with conventional statistical analysis and data mining are explored with implications for social sciences. Special attention is paid to applications to large data sets.
Learning Objectives

Learning Objectives

  • The course gives students an important foundation to develop and conduct their own research as well as to evaluate research of others.
Expected Learning Outcomes

Expected Learning Outcomes

  • Know the theoretical foundation of working with data.
  • Have an understanding of the basic principles of exploratory analysis and lay the foundation for future learning in the area.
  • Know modern extensions to data exploration, including working with “problem data”.
  • Know the basic principles behind working with all types of data for building all types of models
  • Be able to work with major data analysis programs, especially R, so that they can use them and interpret their output.
  • Have the skill to work with statistical software, required to analyze the data.
  • Have the skill to meaningfully develop an appropriate model for the research question.
  • Be able to develop and/or foster critical reviewing skills of published empirical research using applied statistical methods.
  • Be able to criticize constructively and determine existing issues with applied linear models in published work
  • Be able to explore the advantages and disadvantages of various approaches to exploratory analysis, and demonstrate how they relate to other methods of analysis.
Course Contents

Course Contents

  • Introduction to EDA
    The first session will look at the very basics of exploratory analysis, starting with record keeping. It will also look at the essence of what is statistics and data science, and review measurement scales, properties of data, and resources available for working with data.
  • Data on files
    The session will look at data stored in variety of formats, and will discuss files, codes, and for-mats. It will also differentiate XML and JSON formats and look at snowball sampling as a meth-od of data collection.
  • Visualization
    The session will show different ways to visualize data, with examples and software support.
  • Cleaning the data
    This sessions will to step-by-step through leaning and exploring the data, preparing it for basic regression and cluster analysis, and will also look into solving the clustering problem.
  • Symbolic data analysis
    This session covers the foundation of symbolic data analysis, clustering and optimization, leaders method, agglomerative method, and provide examples and references.
Assessment Elements

Assessment Elements

  • non-blocking Project Appropriate clean-up of the data
  • non-blocking Project Basic analysis
  • non-blocking Project Basic inferences about the data
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.3 * Project Appropriate clean-up of the data + 0.5 * Project Basic analysis + 0.2 * Project Basic inferences about the data
Bibliography

Bibliography

Recommended Core Bibliography

  • Fox, J., Jr, & Weisberg, H. S. (2010). An R Companion to Applied Regression. Thousand Oaks: SAGE Publications, Inc. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1236075
  • Montgomery, D. C., Vining, G. G., & Peck, E. A. (2012). Introduction to Linear Regression Analysis (Vol. 5th ed). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1021709
  • Yan, X., Su, X., & World Scientific (Firm). (2009). Linear Regression Analysis: Theory And Computing. Singapore: World Scientific. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=305216

Recommended Additional Bibliography

  • Elliott, A. C., & Woodward, W. A. (2016). SAS Essentials : Mastering SAS for Data Analytics (Vol. Second edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1051725
  • Hocking, R. R. (2013). Methods and Applications of Linear Models : Regression and the Analysis of Variance (Vol. Third edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=603362