# Simulate the distribution of estimates by parametric bootstrap

Source:`R/diagnose.R`

`parametric_boot_distribution.Rd`

Repeatedly simulates new response values by using the fitted model, holding the covariates fixed. By default, refits the same model to each simulated dataset, but an alternative model can be provided. Estimates, confidence intervals, or other quantities are extracted from each fitted model and returned as a tidy data frame.

## Usage

```
parametric_boot_distribution(
fit,
alternative_fit = fit,
data = model.frame(fit),
fn = tidy,
nsim = 100,
...
)
```

## Arguments

- fit
A model fit to data, such as by

`lm()`

or`glm()`

, to simulate new response values from.- alternative_fit
A model fit to data, to refit to the data sampled from

`fit`

. Defaults to`fit`

, but an alternative model can be provided to examine its behavior when`fit`

is the true model.- data
Data frame to be used in the simulation. Must contain the predictors needed for both

`fit`

and`alternative_fit`

. Defaults to the predictors used in`fit`

.- fn
Function to call on each new model fit to produce a data frame of estimates. Defaults to

`broom::tidy()`

, which produces a tidy data frame of coefficients, estimates, standard errors, and hypothesis tests.- nsim
Number of total simulations to run.

- ...
Additional arguments passed to

`fn`

each time it is called.

## Value

A data frame (tibble) with columns corresponding to the columns
returned by `fn`

. The additional column `.sample`

indicates which fit each
row is from.

## Details

The default behavior samples from a model and refits the same model to the
sampled data; this is useful when, for example, exploring how model
diagnostics look when the model is well-specified. Another common use of the
parametric bootstrap is hypothesis testing, where we might simulate from a
null model and fit an alternative model to the data, to obtain the null
distribution of a particular estimate or statistic. Provide `alternative_fit`

to have a specific model fit to each simulated dataset, rather than the model
they are simulated from.

Only the response variable from the `fit`

(or `alternative_fit`

, if given) is
redrawn; other response variables in the population are left unchanged from
their values in `data`

.

## Model limitations

Because this function uses S3 generic methods such as `model.frame()`

,
`simulate()`

, and `update()`

, it can be used with any model fit for which
methods are provided. In base R, this includes `lm()`

and `glm()`

.

The model provided as `fit`

must be fit using the `data`

argument to provide
a data frame. For example:

When simulating new data, this function provides the simulated data as the
`data`

argument and re-fits the model. If you instead refer directly to local
variables in the model formula, this will not work. For example, if you fit a
model this way:

It will not be possible to refit the model using simulated datasets, as that
would require modifying your environment to edit `cars`

.

## See also

`model_lineup()`

to use resampling to aid in regression diagnostics;
`sampling_distribution()`

to simulate draws from the population
distribution, rather than the null

## Examples

```
# Bootstrap distribution of estimates:
fit <- lm(mpg ~ hp, data = mtcars)
parametric_boot_distribution(fit, nsim = 5)
#> # A tibble: 10 × 6
#> term estimate std.error statistic p.value .n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Intercept) 28.7 1.81 15.9 3.77e-16 1
#> 2 hp -0.0627 0.0112 -5.60 4.23e- 6 1
#> 3 (Intercept) 28.5 1.72 16.6 1.09e-16 2
#> 4 hp -0.0639 0.0106 -6.02 1.32e- 6 2
#> 5 (Intercept) 31.6 1.06 29.7 8.31e-24 3
#> 6 hp -0.0764 0.00660 -11.6 1.34e-12 3
#> 7 (Intercept) 30.7 1.49 20.6 2.77e-19 4
#> 8 hp -0.0751 0.00921 -8.16 4.19e- 9 4
#> 9 (Intercept) 31.5 1.86 16.9 6.76e-17 5
#> 10 hp -0.0778 0.0115 -6.76 1.71e- 7 5
# Bootstrap distribution of estimates for a quadratic model, when true
# relationship is linear:
quad_fit <- lm(mpg ~ poly(hp, 2), data = mtcars)
parametric_boot_distribution(fit, quad_fit, nsim = 5)
#> # A tibble: 15 × 6
#> term estimate std.error statistic p.value .n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Intercept) 18.7 0.517 36.2 1.06e-25 1
#> 2 poly(hp, 2)1 -25.4 2.92 -8.71 1.39e- 9 1
#> 3 poly(hp, 2)2 -4.60 2.92 -1.57 1.27e- 1 1
#> 4 (Intercept) 19.1 0.651 29.3 4.14e-23 2
#> 5 poly(hp, 2)1 -26.1 3.68 -7.09 8.50e- 8 2
#> 6 poly(hp, 2)2 -0.151 3.68 -0.0410 9.68e- 1 2
#> 7 (Intercept) 20.3 0.574 35.5 1.97e-25 3
#> 8 poly(hp, 2)1 -27.0 3.24 -8.33 3.48e- 9 3
#> 9 poly(hp, 2)2 -2.91 3.24 -0.896 3.78e- 1 3
#> 10 (Intercept) 20.7 0.785 26.3 8.53e-22 4
#> 11 poly(hp, 2)1 -29.6 4.44 -6.66 2.68e- 7 4
#> 12 poly(hp, 2)2 -1.97 4.44 -0.445 6.60e- 1 4
#> 13 (Intercept) 20.2 0.662 30.5 1.33e-23 5
#> 14 poly(hp, 2)1 -30.5 3.74 -8.15 5.54e- 9 5
#> 15 poly(hp, 2)2 -2.72 3.74 -0.726 4.74e- 1 5
# Bootstrap distribution of estimates for a model with an additional
# predictor, when it's truly zero. data argument must be provided so
# alternative fit has all predictors available, not just hp:
alt_fit <- lm(mpg ~ hp + wt, data = mtcars)
parametric_boot_distribution(fit, alt_fit, data = mtcars, nsim = 5)
#> # A tibble: 15 × 6
#> term estimate std.error statistic p.value .n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Intercept) 31.9 2.53 12.6 2.85e-13 1
#> 2 hp -0.0686 0.0143 -4.80 4.48e- 5 1
#> 3 wt -0.571 1.00 -0.569 5.73e- 1 1
#> 4 (Intercept) 29.7 2.22 13.4 6.22e-14 2
#> 5 hp -0.0671 0.0126 -5.34 9.74e- 6 2
#> 6 wt 0.0746 0.880 0.0848 9.33e- 1 2
#> 7 (Intercept) 27.8 2.69 10.3 3.03e-11 3
#> 8 hp -0.0733 0.0152 -4.82 4.15e- 5 3
#> 9 wt 0.503 1.06 0.472 6.40e- 1 3
#> 10 (Intercept) 28.3 1.81 15.6 1.15e-15 4
#> 11 hp -0.100 0.0102 -9.81 1.02e-10 4
#> 12 wt 1.79 0.716 2.50 1.83e- 2 4
#> 13 (Intercept) 29.7 2.43 12.2 5.87e-13 5
#> 14 hp -0.0579 0.0137 -4.22 2.19e- 4 5
#> 15 wt -0.193 0.961 -0.201 8.42e- 1 5
```