Statistical Learning is a field that ties together statistical theory and practice with the methods of machine learning that have emerged in the last several decades.

This course is geared towards students with the following experience.

  • Exposure and understanding of the fundamental concepts of linear regression: What is it used for? How does one interpret the parameter estimates? What does inference mean in this context?
  • Experience with a programming language. We will use R in this course, but it will be easy enough to learn if you’ve worked before with Python, Java, Matlab, etc.

If you’ve taken Math 141, you have these bases covered. In terms of the mathematical notation, we will be using the vector/matrix formulation in places, though knowledge of linear algebra is not necessary.

Contact

Andrew Bray
Office: Library 304
Office hours: Monday and Wednesday, 11 am - noon; Thursday 3:30 - 4:30 pm

Textbook

An Introduction to Statistical Learning (2013), by James, Witten, Hastie, and Tibshirani. The pdf is available for free and the printed book is available from the bookstore for around $60. The textbook is a key component of the course and I recommend having a hardcopy on hand if possible.

Class components

This course has three components: problem sets, labs, and exams/quizzes/project. For details on the first two, see the tabs at the top of the page.

Exams

We’ll have several examinations and quizzes throughout the semester in order to challenge your understanding and provide us with a sense of where you’re at. Some will be more traditional pen and paper and others are to be done with the computer using R.

Midterm I
TBA

Midterm II
TBA

Final
Takehome during finals week

Project

Your goal is to find a data set of interest to you and develop insight into it by applying the principles and techniques of statistical learning. This is a group project (3 students in a group) that has two deliverables: a single Rmd research report that is submitted via GitHub and a 10-15 minute presentation.

Research Report
Last week of class Template

Presentations
Last week of class

General Timeline

Week Topic
1 Foundations of Stat Learning
2 Simple Linear Regression
3 Multiple Linear Regression
4 Classification
5 Classification
6 Resampling Methods
7 Nonparametrics
8 SVM
9 Tree-based methods
10 Tree-based methods
11 Unsupervised Learning
12 Unsupervised Learning
13 Unsupervised Learning