A lineup hides diagnostics among "null" diagnostics, i.e. the same
diagnostics calculated using models fit to data where all model assumptions
are correct. For each null diagnostic, `model_lineup()`

simulates new
responses from the model using the fitted covariate values and the model's
error distribution, link function, and so on. Hence the new response values
are generated under ideal conditions: the fitted model is true and all
assumptions hold. `decrypt()`

reveals which diagnostics are the true
diagnostics.

## Arguments

- fit
- fn
A diagnostic function. The function's first argument should be the fitted model, and it must return a data frame. Defaults to

`broom::augment()`

, which produces a data frame containing the original data and additional columns`.fitted`

,`.resid`

, and so on. To see a list of model types supported by`broom::augment()`

, and to find documentation on the columns reported for each type of model, load the`broom`

package and use`methods(augment)`

.- nsim
Number of total diagnostics. For example, if

`nsim = 20`

, the diagnostics for`fit`

are hidden among 19 null diagnostics.- ...
Additional arguments passed to

`fn`

each time it is called.

## Value

A data frame (tibble) with columns corresponding to the columns
returned by `fn`

. The additional column `.sample`

indicates which set of
diagnostics each row is from. For instance, if the true data is in position
5, selecting rows with `.sample == 5`

will retrieve the diagnostics from
the original model fit.

## Details

To generate different kinds of diagnostics, the user can provide a custom
`fn`

. The `fn`

should take a model fit as its argument and return a data
frame. For instance, the data frame might contain one row per observation and
include the residuals and fitted values for each observation; or it might be
a single row containing a summary statistic or test statistic.

`fn`

will be called on the original `fit`

provided. Then
`parametric_boot_distribution()`

will be used to simulate data from the model
fit `nsim - 1`

times, refit the model to each simulated dataset, and run `fn`

on each refit model. The null distribution is conditional on X, i.e. the
covariates used will be identical, and only the response values will be
simulated. The data frames are concatenated with an additional `.sample`

column identifying which fit each row came from.

When called, this function will print a message such as
`decrypt("sD0f gCdC En JP2EdEPn ZY")`

. This is how to get the location of the
true diagnostics among the null diagnostics: evaluating this in the R console
will produce a string such as `"True data in position 5"`

.

## Model limitations

Because this function uses S3 generic methods such as `model.frame()`

,
`simulate()`

, and `update()`

, it can be used with any model fit for which
methods are provided. In base R, this includes `lm()`

and `glm()`

.

The model provided as `fit`

must be fit using the `data`

argument to provide
a data frame. For example:

When simulating new data, this function provides the simulated data as the
`data`

argument and re-fits the model. If you instead refer directly to local
variables in the model formula, this will not work. For example, if you fit a
model this way:

It will not be possible to refit the model using simulated datasets, as that
would require modifying your environment to edit `cars`

.

## References

Buja et al. (2009). Statistical inference for exploratory data
analysis and model diagnostics. *Philosophical Transactions of the Royal
Society A*, 367 (1906), pp. 4361-4383. doi:10.1098/rsta.2009.0120

Wickham et al. (2010). Graphical inference for infovis. *IEEE Transactions on
Visualization and Computer Graphics*, 16 (6), pp. 973-979.
doi:10.1109/TVCG.2010.161

## See also

`parametric_boot_distribution()`

to simulate draws by using the
fitted model to draw new response values; `sampling_distribution()`

to
simulate draws from the population distribution, rather than from the model

## Examples

```
fit <- lm(dist ~ speed, data = cars)
model_lineup(fit, nsim = 5)
#> decrypt("nsW7 Ykjk l3 gCPljlC3 44")
#> # A tibble: 250 × 9
#> dist speed .fitted .resid .hat .sigma .cooksd .std.resid .sample
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 4 -1.85 3.85 0.115 15.5 0.00459 0.266 1
#> 2 10 4 -1.85 11.8 0.115 15.4 0.0435 0.819 1
#> 3 4 7 9.95 -5.95 0.0715 15.5 0.00620 -0.401 1
#> 4 22 7 9.95 12.1 0.0715 15.4 0.0255 0.813 1
#> 5 16 8 13.9 2.12 0.0600 15.5 0.000645 0.142 1
#> 6 10 9 17.8 -7.81 0.0499 15.5 0.00713 -0.521 1
#> 7 18 10 21.7 -3.74 0.0413 15.5 0.00133 -0.249 1
#> 8 26 10 21.7 4.26 0.0413 15.5 0.00172 0.283 1
#> 9 34 10 21.7 12.3 0.0413 15.4 0.0143 0.814 1
#> 10 17 11 25.7 -8.68 0.0341 15.5 0.00582 -0.574 1
#> # ℹ 240 more rows
resids_vs_speed <- function(f) {
data.frame(resid = residuals(f),
speed = model.frame(f)$speed)
}
model_lineup(fit, fn = resids_vs_speed, nsim = 5)
#> decrypt("nsW7 Ykjk l3 gCPljlC3 44")
#> # A tibble: 250 × 3
#> resid speed .sample
#> <dbl> <dbl> <dbl>
#> 1 3.85 4 1
#> 2 11.8 4 1
#> 3 -5.95 7 1
#> 4 12.1 7 1
#> 5 2.12 8 1
#> 6 -7.81 9 1
#> 7 -3.74 10 1
#> 8 4.26 10 1
#> 9 12.3 10 1
#> 10 -8.68 11 1
#> # ℹ 240 more rows
```