Introduction to collection and analysis of 'Big data'
- Know basic methods of collecting nonreactive data in social sciences
- Know different types of big data in social sciences
- Use skills to collect online data (Wikipedia, YouTube, etc).
- Use skills to analyze textual data
- Know basic concepts of Python programming language;
- Have skills to write Python code for basic data analysis tasks
- Have skills to analyze textual data
- Have skills to scrap online data through various APIs, automatization of actions in browser, and etc
- Know basic concepts of Big data, its opportunities, limitations, and relevance to social sciences
- Introduction to PythonAnaconda. Virtual environments. Jupyter notebook. Basic data types and structures. Basic functions and operators, methods and packages.
- Basic data manipulation in PythonBasic dataframe manipulations in Python: filtering rows, selecting columns, slicing rows, creating new variables, arranging columns, joins, aggregation and grouping. Exploratory Data Analysis: descriptive statistics and visualization. Competitive data science: kaggle competitions.
- Basic Text ProcessingText preprocessing procedures: cleaning raw data: lowering case, removal of special characters and stopwords, etc.; tokenization and segmentation; normalization of words: stemming, lemmatization. Text processing: N-grams, TF-IDF. Frequency-based keyword extraction.
- Web-scrappingJson module in Python. HTML-structure of a web page. requests package. Blocking a request, methods of solution applying fake_useragent package. Working with dynamic pages (user behaviour imitation) using Selenium package. Extracting information from tags: BeautifulSoup package.
- Client server architecture and request response: work with APIsPublic and private API. API YouTube. Quotas.
- Distributional semantics and topic modelingGloVe, Word2vec (CBOW and Skip-gram model architectures) and other word embedding methods. Topic Mining and Analysis: Motivation and Task Definition. Latent Dirichlet Allocation (LDA).
- Distributional semantics and topic modelingFlexible and interpretable NLP models (on the example of LDA2vec). Evaluation metrics in NLP.
- Introduction to Deep Learning in PythonIntroduction to neural networks: Artificial neural networks (ANNs), model weights, loss function, activation functions, hyperparameters, forward and back propagation in neural networks, gradient boosting, stochastic gradient boosting, model performance (overfit and underfit problem, main performance metrics). Coding neural networks in Keras: stacking lyers, using momentum and Adam optimization, applying early stopping and learning rate scheduler.
- Sequence modelingRecurrent neural networks. Gradient exploding/vanishing problem. Recurrent neural networks designed to mitigate this issue: LSTM.
- Introduction to TransformersAttention model intuition. Transformer network architecture: self-attention, multi-head attention.
- Interim assessment (2 module)0.15 * Homework 1 + 0.3 * Homework 2 + 0.15 * Homework 3 + 0.15 * Homework 4 + 0.15 * Homework 5 + 0.1 * Quizzes
- Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied Text Analysis with Python : Enabling Language-Aware Data Products with Machine Learning. Beijing: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1827695
- Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
- Hajba G.L. Website Scraping with Python: Using BeautifulSoup and Scrapy / G.L. Hajba, Berkeley, CA: Apress, 2018.
- Jeremy Howard, & Sylvain Gugger. (2020). Deep Learning for Coders with Fastai and PyTorch. O’Reilly Media.
- Siddhartha Bhattacharyya, Vaclav Snasel, Aboul Ella Hassanien, Satadal Saha, & B. K. Tripathy. (2020). Deep Learning : Research and Applications. De Gruyter.
- Vanderplas, J. T. (2016). Python Data Science Handbook : Essential Tools for Working with Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1425081
- Eric Matthes. (2019). Python Crash Course, 2nd Edition : A Hands-On, Project-Based Introduction to Programming: Vol. 2nd edition. No Starch Press.