Neurobayesian Models

Master 2019/2020

Category 'Best Course for Career Development'

Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'

Category 'Best Course for New Knowledge and Skills'

Type: Elective course (Statistical Learning Theory)

Area of studies: Applied Mathematics and Informatics

Delivered by: Big Data and Information Retrieval School

Where: Faculty of Computer Science

When: 2 year, 3 module

Mode of studies: offline

Instructors: Ekaterina Lobacheva, Dmitry Vetrov

Master’s programme: Statistical Learning Theory

Language: English

ECTS credits: 6

Contact hours: 34

Full Syllabus

Abstract

This course is devoted to Bayesian reasoning in application to deep learning models. Attendees would learn how to use probabilistic modeling to construct neural generative and discriminative models, how to use the paradigm of generative adversarial networks to perform approximate Bayesian inference and how to model the uncertainty about the weights of neural networks. Selected open problems in the field of deep learning would also be discussed. The practical assignments will cover implementation of several modern Bayesian deep learning models.

Learning Objectives

The learning objective of the course is to give students basic and advanced tools for inference and learning in complex probabilistic models involving deep neural networks, such as probabilistic deep generative models and Bayesian neural networks.

Expected Learning Outcomes

Knowledge about different approximate inference and learning techniques for probabilistic models
Hands-on experience with modern probabilistic modifications of deep learning models
Knowledge about the necessary building blocks that allow to construct new probabilistic models, suitable for the desired problems

Course Contents

Stochastic Variational Inference (SVI) and Doubly SVI (DSVI)
SVI as a scalable alternative to the variational inference for tasks with large data. Application of SVI to latent Dirichlet allocation model.
Bayesian neural networks and bayesian compression of neural networks
Variational inference of the posterior distribution over the weights of discriminative neural networks. Local reparameterization trick for gradient variance reduction. Variational Dropout sparsifies deep neural networks: different parametrization yields totally different model. Soft Weight Sharing: how to save memory, using weights quantization of neural network
Variational autoencoders (VAE) and normalizing flows (NF)
Probabilistic PCA, VAE as a non-linear generalization of probabilistic PCA. Reparametrization trick for doubly-stochastic variational inference. Extending variational approximations with normalizing flows. Examples of normalizing flows
Discrete Latent Variables and Variance Reduction
The idea of Stochastic Computation Graphs, discrete and continuous stochastic nodes, and gradient estimation: Gumbel-Softmax and REINFORCE with control variates.
Implicit Variational Inference using Adversarial Training
Adversarial Variational Bayes for training VAE with implicit inference distribution. f-GANs as a generalization of vanilla GANs for optimizing arbitrary f-divergence.
Inference in implicit probabilistic models
Implicit and semi-implicit distributions are flexible parametric families that can be constructed with neural networks in a general way. Such distributions can be used as building blocks for probabilistic models. How to construct such distributions and how to perform inference with such models.
Deep MCMC
How neural networks help MCMC methods to sample from analytical distribution, and how MCMC methods help neural networks to sample from empirical distribution.

Assessment Elements

Practical assignments
Practical assignments consist of programming some models/methods from the course in Python and analysing their behavior: Sparse Variational Dropout (SVDO), NF, VAE, Discrete Latent Variables (DLV).
Exam
2-ой курс. Экзамен состоялся в 3-ем модуле

Interim Assessment

Interim assessment (3 module)
0.3 * Exam + 0.7 * Practical assignments

Bibliography

Recommended Core Bibliography

Christopher M. Bishop. (n.d.). Australian National University Pattern Recognition and Machine Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EBA0C705
Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=480968
Гудфеллоу Я., Бенджио И., Курвилль А. - Глубокое обучение - Издательство "ДМК Пресс" - 2018 - 652с. - ISBN: 978-5-97060-618-6 - Текст электронный // ЭБС ЛАНЬ - URL: https://e.lanbook.com/book/107901

Recommended Additional Bibliography

Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight Uncertainty in Neural Networks. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1505.05424
Grathwohl, W., Choi, D., Wu, Y., Roeder, G., & Duvenaud, D. (2017). Backpropagation through the Void: Optimizing control variates for black-box gradient estimation. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1711.00123
Jang, E., Gu, S., & Poole, B. (2016). Categorical Reparameterization with Gumbel-Softmax. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1611.01144
Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1312.6114
Kingma, D. P., Salimans, T., & Welling, M. (2015). Variational Dropout and the Local Reparameterization Trick. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1506.02557
Levy, D., Hoffman, M. D., & Sohl-Dickstein, J. (2017). Generalizing Hamiltonian Monte Carlo with Neural Networks. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1711.09268
Louizos, C., & Welling, M. (2017). Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1703.01961
Maddison, C. J., Mnih, A., & Teh, Y. W. (2016). The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1611.00712
Matt Hoffman, David M. Blei, Chong Wang, & John Paisley. (2013). Stochastic Variational Inference. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.C4CCD6D4
Mescheder, L., Nowozin, S., & Geiger, A. (2017). Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1701.04722
Molchanov, D., Ashukha, A., & Vetrov, D. (2017). Variational Dropout Sparsifies Deep Neural Networks. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1701.05369
Nowozin, S., Cseke, B., & Tomioka, R. (2016). f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1606.00709
Rezende, D. J., & Mohamed, S. (2015). Variational Inference with Normalizing Flows. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1505.05770
Sida I. Wang, & Christopher D. Manning. (2013). Fast dropout training. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.C2036E9B
Song, J., Zhao, S., & Ermon, S. (2017). A-NICE-MC: Adversarial Training for MCMC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1706.07561
Tucker, G., Mnih, A., Maddison, C. J., Lawson, D., & Sohl-Dickstein, J. (2017). REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1703.07370
Ullrich, K., Meeds, E., & Welling, M. (2017). Soft Weight-Sharing for Neural Network Compression. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1702.04008

Course Syllabus