23  Multilevel and Hierarchical Models

In the linear and generalized linear models so far, the regression parameters (\(\beta\)) are fixed properties of the population, and our task is to estimate them with as much precision as possible. We then interpret the parameters to make statements about the population. The parameters are fixed, so their effects are known as fixed effects.

But it is possible for the relationship between predictor and response to also depend on random quantities—on properties of the particular random sample we obtain. When this happens, some of the model parameters are random, and so their effects are known as random effects.

We can construct linear and generalized linear models with random effects, known as random effects models or mixed effects models (“mixed” because they contain both fixed and random effects). As we will see, these are useful for modeling many kinds of data with grouped or repeated structure. They can also be thought of as a way of applying penalization, as discussed in Chapter 19, to regression coefficients in a more structured way, and so can give useful estimates even when the parameters are not truly random.

23.1 Motivating examples

Example 23.1 (Teaching strategies) An educational researcher recruits 30 teachers to try new teaching strategies in their sixth-grade classrooms. Half are assigned strategy A, half are assigned strategy B, and at the end of the semester the students take an exam to measure their mastery of the subject. We would like to estimate the difference in scores and determine which strategy produces higher test scores.

Each classroom has many students, so each row of the data will be one student, with covariates for their assigned teaching strategy and their teacher. We may expect the teacher to have an effect: perhaps some teachers are more experienced than others, so their entire class has a higher average score than others using the same teaching strategy.

If our model is simply \[ \text{score} = \beta_0 + \beta_1 (\text{strategy B}) + e, \] we would expect to see structure in the residuals \(\hat e\): students in an experienced teacher’s classroom would have positive residuals, students in an inexperienced teacher’s classroom would have negative residuals, and hence students in the same classroom have correlated errors. This violates the assumption of independent errors.

We can add teacher as a factor predictor, but these teachers are just particular teachers from the population of all teachers. We are interested in how the average teacher would perform with each strategy, not in estimating specifically how Mrs. Humperdinck performs with strategy B.

23.2 Mixed models

Definition 23.1 (Mixed model) A mixed model is a linear model written as \[ Y = \beta\T X + b\T Z + e, \] where \(\beta \in \R^p\) is a fixed population parameter, \(X \in \R^p\) is a vector of known covariates, \(b \in \R^q\) is a vector of random effects, and \(Z \in \R^q\) is another vector of known covariates.

Generally it is assumed that \(\E[e] = 0\), \(\cov(b, e) = 0\), and the distribution of \(b\) is a parametric family with mean 0 and known or estimable covariance.

Example 23.2 (Random intercepts model) In the teaching strategies example (Example 23.1), let \(X\) contain the intercept and an indicator of teaching strategy B, so \(X \in \R^2\). Let \(Z\) contain the 30 dummy variables encoding which classroom the student is in (Section 7.3), and let \[ b \sim \normal(\zerovec, \sigma_b^2 I). \] We can use 30 dummy variables, rather than absorbing one into the intercept as in normal treatment contrasts (Definition 7.1), because \(b\) is defined to have mean 0.