SML 515 - Statistical Data Analysis

new course in Spring 2020: Wed/Fri 11-12:20

The course provides an introduction to modern data analysis and data science. It addresses the central question “What should I do if these are my data and this is what I want to know”? The course covers basic and advanced statistical descriptions of data. It also introduces the computational means and software packages to explore data and infer underlying structural parameters from them. The topics are exemplified by real-world applications. Prerequisites are linear algebra, multi-variate analysis, and a familiarity with basic statistics and programming (ideally in python).

The course adopts a model-based, largely Bayesian, approach and leverages recent developments in the physical sciences. Applications for problem sets and the final project are drawn from across the sciences. To support the interdisciplinary character of the course, practitioners from several departments present applications from their work.

Weekly Syllabus

  1. Principled Data Analysis: signal model and error model, likelihood and priors
  2. Probability Distributions
  3. Generative Clustering and Classification
  4. Gaussian Processes
  5. Fitting your own model: gradient-based optimization
  6. Automatic differentiation
  7. Error Estimation
  8. Sampling Methods: MCMC and variants
  9. Advanced Sampling: Hamiltonian MC, ensemble and nested methods
  10. Hierarchical Models
  11. Likelihood-free Methods
  12. Hypothesis testing

Reading

Recommended Reading:

Additional Reading (book excerpts, papers):