1  Introduction

This is a course in regression analysis. That term is broad: Regression is about finding relationships between variables, and there are many ways to do this. Broadly, then, this course is a transition between the traditional and modern ways to do regression.

In the traditional mode, we use

In the modern mode, we

We will hit the highlights of the traditional methods (i.e. we will cover them quickly), but I will assume you have seen the basics of regression before. We will cover, at a somewhat higher mathematical level, topics like multiple regression, variable selection, penalized regression, and generalized linear models, focusing on the computation, testing, diagnostic, and model selection tools necessary to put these methods to good use.

I will also assume you have a solid linear algebra background and can program in R. If you’re a little rusty, the syllabus refers to good books on both topics that can be used as references.

After covering regression, we’ll cover some more advanced topics, including more nonparametric and additive models, missing data, and hierarchical models. If there is time, we might also discuss the very basics of survival analysis and experimental design.

But overall, our focus will not be on the derivation of theoretical results about regression estimators. Our focus will instead be on the application of regression to answer substantive questions with real data. Often the most challenging part of any data analysis is figuring out what question is being asked, how that question can be translated into a statistical question, and determining if that statistical question is even answerable from the data—and so we will spend plenty of time practicing these skills. If there is one thing you should learn from this course, it is that a careful, thorough, and useful data analysis is a rare thing indeed, and any statistician handling real data well will find no end of interesting substantive, statistical, and even theoretical problems to work on.