SML 515 - Statistical Data Analysis
new course in Spring 2020: Wed/Fri 11-12:20
The course provides an introduction to modern data analysis and data science. It addresses the central question “What should I do if these are my data and this is what I want to know”? The course covers basic and advanced statistical descriptions of data. It also introduces the computational means and software packages to explore data and infer underlying structural parameters from them. The topics are exemplified by real-world applications. Prerequisites are linear algebra, multi-variate analysis, and a familiarity with basic statistics and programming (ideally in python).
The course adopts a model-based, largely Bayesian, approach and leverages recent developments in the physical sciences. Applications for problem sets and the final project are drawn from across the sciences. To support the interdisciplinary character of the course, practitioners from several departments present applications from their work.
Weekly Syllabus
- Principled Data Analysis: signal model and error model, likelihood and priors
- Probability Distributions
- Generative Clustering and Classification
- Gaussian Processes
- Fitting your own model: gradient-based optimization
- Automatic differentiation
- Error Estimation
- Sampling Methods: MCMC and variants
- Advanced Sampling: Hamiltonian MC, ensemble and nested methods
- Hierarchical Models
- Likelihood-free Methods
- Hypothesis testing
Reading
Recommended Reading:
- Machine Learning: A Probabilistic Perspective
free to read online through the PU library
Additional Reading (book excerpts, papers):
- Bayesian data analysis, 3rd edition, by Andrew Gelman et al. (2013)
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, by Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009)()
- Information Theory, Inference and Learning Algorithms, by David MacKay (2005)
- Data analysis recipes: Fitting a model to data, by David Hogg (2010)
- Gaussian Processes for Machine Learning, by Carl Edward Rasmussen and Chris Williams (2006)
- Automatic Differentiation, by Roger Grosse (2019)
- Adam: A Method for Stochastic Optimization, by Kingma & Ba (2015)
- Proximal Algorithms, by Parikh & Boyd (2013)
- Error estimation in astronomy: A guide, by Rene Andrae (2010)
- Data analysis recipes: Using Markov Chain Monte Carlo, by David Hogg and Daniel Foremam-Mackey (2017)
- MULTINEST: an efficient and robust Bayesian inference tool for cosmology and particle physics, by Feroz, Hobson, Bridges (2009)
- Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology, by Justin Alsing, Ben Wandelt, Stephen Feeney (2019)