# Generalized Linear Models

So far, we’ve considered cases where we have some predictors $$X\in \R^p$$ and a response $$Y \in \R$$, and we want to learn something about the relationship or at least predict $$Y$$ from $$X$$. We started with linear regression as a simple tool, then moved to more flexible models.

But what if $$Y \in \{0, 1\}$$ or some other restricted set, such as nonnegative integers?

In a generalized linear model, we make linear models for data where $$Y$$ comes from some distribution parametrized by a function of $$\beta\T X$$. We can think of generalized linear models as having two parts:

1. The systematic part of the model relates the mean of $$Y$$ to some function of $$\beta\T X$$.
2. The random part specifies the distribution of $$Y$$ around that mean.

For example, in an ordinary linear model, the systematic part is simply $$\beta\T X$$, and the random part specifies that $$Y$$ has a normal distribution with variance $$\sigma^2$$ around that mean.

We will begin with logistic regression, which is a (deceptively) simple method for modeling binary data.