Statistical Learning is a field that ties together statistical theory and practice with the methods of machine learning that have emerged in the last several decades.

This course is geared towards students with the following experience.

  • Exposure and understanding of the fundamental concepts of linear regression: What is it used for? How does one interpret the parameter estimates? What does inference mean in this context?
  • Experience with a programming language. We will use R in this course, but it will be easy enough to learn if you’ve worked before with Python, Java, Matlab, etc.

If you’ve taken Math 141, you have these bases covered. In terms of the mathematical notation, we will be using the vector/matrix formulation in places, though knowledge of linear algebra is not necessary.


Andrew Bray
Office: Library 304
Office hours: Monday and Wednesday, 11 am - noon; Thursday 3:30 - 4:30 pm


An Introduction to Statistical Learning (2013), by James, Witten, Hastie, and Tibshirani. The pdf is available for free and the printed book is available from the bookstore for around $60. The textbook is a key component of the course and I recommend having a hardcopy on hand if possible.

Class components

This course has three components: problem sets, labs, and exams/quizzes/project. For details on the first two, see the tabs at the top of the page.


We’ll have several examinations and quizzes throughout the semester in order to challenge your understanding and provide us with a sense of where you’re at. Some will be more traditional pen and paper and others are to be done with the computer using R.

Midterm I

Midterm II

Takehome during finals week

General Timeline

Week Topic
1 Foundations of Stat Learning
2 Simple Linear Regression
3 Multiple Linear Regression
4 Classification
5 Classification
6 Resampling Methods
7 Nonparametrics
9 Tree-based methods
10 Tree-based methods
11 Unsupervised Learning
12 Unsupervised Learning
13 Unsupervised Learning