Data and Analytics in Finance

Master 2019/2020

Category 'Best Course for Career Development'

Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'

Category 'Best Course for New Knowledge and Skills'

Type: Compulsory course (Finance)

Area of studies: Finance and Credit

Delivered by: School of Economics and Finance

Where: Faculty of Economics, Management, and Business Informatics

When: 1 year, 3, 4 module

Mode of studies: offline

Instructors: Petr Parshakov, Evgeniya Shenkman

Master’s programme: Finance

Language: English

ECTS credits: 6

Contact hours: 72

Full Syllabus

Abstract

The course is aimed to provide students with the basic understanding of data analytics and machine learning concepts with regard to finance and practical implementation of these concepts by using programming software in order to provide organizations with data-driven solutions. The course begins with essentials of data collection and wrangling. The aim of this part is to teach students how to find, parse, import, manipulate and visualize financial data. The next part of the course provides students with research and analytical skills and covers such methods as principal component analysis, clustering, different techniques of curve fitting and LASSO regression. The final part of the course shows how machine learning methods can be applied to finance through the example of fraud detection. The course is based on real data from open sources and data on Russian and European public companies collected by International laboratory of intangible-driven economy NRU HSE and data on sales and customer analytics provided by laboratory GAMES NRU HSE. After completing the course students will be able to use data management techniques, to optimise asset portfolio, to provide customer analytics and detect fraud. Course is realized on Zoom/MS Teams platforms and is supported by LMS. Personal consultations are available through Skype and e-mail by request.

Learning Objectives

Work easily in R, import data in R, make basic manipulation with it to prepare data for calculations and export results of calculations.
Apply methods of data analysis and understand their objectives.
Understand limitation and relevance of the methods.

Expected Learning Outcomes

Apply skills in data cleaning.
Demonstrate the ability to work in different software environments for data analysis and to explain the choice of software.
Understand basic theories in analysis of financial data, invent and write a code for a particular task in finance data analysis.
Master ability of making decision on base of data analysis and proving them.
Make decision in finance on base of data analysis and prove them.

Course Contents

Data wrangling with R
1. Introduction to R: Data Structures; Subsetting; Functions; Vectorization. 2. Data Wrangling: Tidy Data; Reshape; Summarize. 3. Data Visualization: Base Graphics; Grammar of Graphics; Interactive Graphics.
Optimization problems on financial data
4. Principal component analysis and clustering. Main objectives of principal component analysis (PCA). Mathematical model of components discovery. Algorithms of PCA implementation. Latent variable, criteria for defining number of components. Rotation, interpretation of the results. Main objectives of clustering, geometrical interpretation. Measures of distance between objects and measures of distance between clusters. k-means and k-median clustering: objective, algorithm, results interpretation. Criteria for defining number of clusters and quality of clustering. Method implementation for case-study “Customer analytics in banks”. 5. Curve fitting. Main objective of curve fitting and financial problems, that it can help to solve. Interpolation and extrapolation. Different types of curve fitting: polynomial and spline interpolation (local polynomial fitting). Procedure of estimating curve fitting. Method implementation for case-study “Fitting yield curve”. 6. Portfolio optimization on data. Optimal portfolio of two risky assets: theoretical model. Model solution as a solution of quadratic programming problem. Sensitivity to model inputs. Optimal portfolio problem for p-dimensions. and LASSO technique to deal it. Method implementation for case-study “Construction a portfolio on trading data of a stock”.
Fraud detection using machine learning
7. Introduction to fraud detection and Data preprocessing. Importance of fraud detection. Definition and types of fraud. Types of variables. Data exploration and visualization. Dealing with missing values. Standardizing and transforming data. 8. Featurization, Social Network Analysis and Dealing with imbalanced datasets. Traditional features for fraud detection. Social Network Analysis. Random oversampling (ROS) and random undersampling (RUS). Synthetic Minority Over-sampling Techniques (SMOTE). 9. Supervised and unsupervised techniques for fraud detection. Linear and logistic regression. Decision trees and ensemble methods. Evaluating fraud detection models. Digit analysis using Benford’s Law. Multivariate outlier detection using robust statistics.

Assessment Elements

Test 1
Test 2
Seminar activities
Self-study students’ work
Exam

Interim Assessment

Interim assessment (4 module)
0.4 * Exam + 0.15 * Self-study students’ work + 0.15 * Seminar activities + 0.15 * Test 1 + 0.15 * Test 2

Bibliography

Recommended Core Bibliography

Provost, F., & Fawcett, T. (2013). Data Science for Business : What You Need to Know About Data Mining and Data-Analytic Thinking (Vol. 1st ed). Beijing: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=619895

Recommended Additional Bibliography

Tsay, R. S. (2013). An Introduction to Analysis of Financial Data with R. Wiley.

Course Syllabus