36-707: Regression Analysis

– Fall 2022 (last updated August 29, 2022) all courses · refsmmat.com

This is a course in data analysis. Topics covered include: Simple and multiple linear regression, causation, diagnostics, logistic regression and generalized linear models; Model selection: prediction risk, bias-variance tradeoff, risk estimation, model search, ridge regression and lasso, stepwise regression; smoothing and nonparametric regression: linear smoothers, kernels, local regression, penalized regression, splines, variance estimation, confidence bands, local likelihood, additive models. Students will practice real-world data analysis through several course projects.

This course is primarily for first-year PhD students in Statistics & Data Science. Students in other programs should check the syllabus for full prerequisite and waitlist information.

For course policies, consult the syllabus.

Vital information

MW 1:25-2:45pm, Fall 2022
Baker 232M (subject to change)
Alex Reinhart
Teaching assistant
James Carzon
Office hours
Monday 3-4pm in Baker 229A (James); Tuesday 3-4pm in Baker 232K (Alex)
areinhar@stat.cmu.edu. (But please save complicated questions for office hours.)
Announcements and homework submission
Lecture notes
Online here


This schedule is approximate and subject to change. Schedule is by week:

  1. Causality
  2. Multiple regression
  3. Regression and regression assumptions
  4. Regression diagnostics and model selection
  5. Model search; first project assigned
  6. Nonparametric regression and smoothing; first project due
  7. Bootstrapping; logistic regression
  8. Fall break; no class
  9. Generalized linear models
  10. GLMs
  11. Additive models
  12. Missing data and hierarchical models
  13. Hierarchical models; Thanksgiving
  14. Hierarchical models
  15. Assorted topics such as survival analysis and experimental design