Statistical Learning is a field that ties together statistical theory and practice with the methods of machine learning have emerged in the last several decades.

This course is geared towards students with the following experience.

  • Exposure and understanding of the fundamental concepts of linear regression: What is it used for? How does one interpret the parameter estimates? What does inference mean in this context?
  • Experience with a programming language. We will use R in this course, but it will be easy enough to learn if you’ve worked before with Python, Java, Matlab, etc.

If you’ve taken Math 141, you have these bases covered. In terms of the mathematical notation, we will be using the vector/matrix formulation in places, though knowledge of linear algebra is not necessary.


Andrew Bray
Office: Library 304
Office hours: Tuesday 4-5, Thursday 3-4


An Introduction to Statistical Learning (2013), by James, Witten, Hastie, and Tibshirani. The pdf is available for free and the printed book is available from the bookstore for around $60. The textbook is a key component of the course and I recommend having a hardcopy on hand if possible.

Class components

This course has three components: problem sets, labs, and exams/quizzes/project. For details on the first two, see the tabs at the top of the page.


We’ll have several examinations and quizzes throughout the semester in order to challenge your understanding and provide us with a sense of where you’re at. Some will be more traditional pen and paper and others are to be done with the computer using R.

Midterm I
Friday, February 26 2016

Midterm II
Friday, April 8th (takehome)

Takehome during finals week (link)


Your goal is to find a data set of interest to you and develop insight into it by applying the principles and techniques of statistical learning. This is a group project (3 students in a group) that has two deliverables: a single Rmd research report that is submitted via GitHub and a 10-15 minute presentation.

Research Report
April 29th 1 pm: Template (invite)

April 27 and 29th in class

General Timeline

Week Topic
1 Foundations of Stat Learning
2 Simple Linear Regression
3 Multiple Linear Regression
4 Classification
5 Classification
6 Resampling Methods
7 Nonparametrics
9 Tree-based methods
10 Tree-based methods
11 Unsupervised Learning
12 Unsupervised Learning
13 Unsupervised Learning