Skip to contents

Repeatedly refits the model to new samples from the population, calculates estimates for each fit, and compiles a data frame of the results.

Usage

sampling_distribution(fit, data, fn = tidy, nsim = 100, fixed_x = TRUE, ...)

Arguments

fit

A model fit to data, such as by lm() or glm(), to refit to each sample from the population.

data

Data drawn from a population(), using sample_x() and possibly sample_y(). The population() specification is used to draw the samples.

fn

Function to call on each new model fit to produce a data frame of estimates. Defaults to broom::tidy(), which produces a tidy data frame of coefficients, estimates, standard errors, and hypothesis tests.

nsim

Number of simulations to run.

fixed_x

If TRUE, the default, the predictor variables are held fixed and only the response variables are redrawn from the population. If FALSE, the predictor and response variables are drawn jointly.

...

Additional arguments passed to fn each time it is called.

Value

Data frame (tibble) of nsim + 1 simulation results, formed by concatenating together the data frames returned by fn. The .sample

column identifies which simulated sample each row came from. Rows with .sample == 0 come from the original fit.

Details

To generate sampling distributions of different quantities, the user can provide a custom fn. The fn should take a model fit as its argument and return a data frame. For instance, the data frame might contain one row per estimated coefficient and include the coefficient and its standard error; or it might contain only one row of model summary statistics.

Refitting is done using the S3 generic update(), so this function can be used with any model fit that supports update(). In base R, this includes lm() and glm(), and many other model fits.

See also

parametric_boot_distribution() to simulate draws from a fitted model, rather than from the population

Examples

pop <- population(
  x1 = predictor("rnorm", mean = 4, sd = 10),
  x2 = predictor("runif", min = 0, max = 10),
  y = response(0.7 + 2.2 * x1 - 0.2 * x2, error_scale = 4.0)
)

d <- sample_x(pop, n = 20) |>
  sample_y()

fit <- lm(y ~ x1 + x2, data = d)
# using the default fn = broom::tidy(). conf.int argument is passed to
# broom::tidy()
samples <- sampling_distribution(fit, d, conf.int = TRUE)
samples
#> # A tibble: 303 × 8
#>    term        estimate std.error statistic  p.value conf.low conf.high .sample
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>   <dbl>
#>  1 (Intercept)   1.07      1.52       0.707 4.89e- 1  -2.13       4.27        0
#>  2 x1            2.12      0.0698    30.3   3.02e-16   1.97       2.27        0
#>  3 x2           -0.211     0.231     -0.917 3.72e- 1  -0.698      0.275       0
#>  4 (Intercept)  -1.41      1.82      -0.773 4.50e- 1  -5.25       2.44        1
#>  5 x1            2.23      0.0838    26.6   2.66e-15   2.06       2.41        1
#>  6 x2           -0.166     0.277     -0.599 5.57e- 1  -0.750      0.418       1
#>  7 (Intercept)  -1.52      1.82      -0.836 4.15e- 1  -5.36       2.32        2
#>  8 x1            2.20      0.0837    26.3   3.32e-15   2.02       2.38        2
#>  9 x2            0.0909    0.276      0.329 7.46e- 1  -0.492      0.674       2
#> 10 (Intercept)   4.73      2.22       2.14  4.75e- 2   0.0589     9.41        3
#> # ℹ 293 more rows

suppressMessages(library(dplyr))
# the model is correctly specified, so the estimates are unbiased:
samples |>
  group_by(term) |>
  summarize(mean = mean(estimate),
            sd = sd(estimate))
#> # A tibble: 3 × 3
#>   term          mean     sd
#>   <chr>        <dbl>  <dbl>
#> 1 (Intercept)  0.479 2.03  
#> 2 x1           2.19  0.0995
#> 3 x2          -0.168 0.304 

# instead of coefficients, get the sampling distribution of R^2
rsquared <- function(fit) {
  data.frame(r2 = summary(fit)$r.squared)
}
sampling_distribution(fit, d, rsquared, nsim = 10)
#> # A tibble: 11 × 2
#>       r2 .sample
#>    <dbl>   <dbl>
#>  1 0.982       0
#>  2 0.975       1
#>  3 0.981       2
#>  4 0.978       3
#>  5 0.961       4
#>  6 0.950       5
#>  7 0.973       6
#>  8 0.979       7
#>  9 0.964       8
#> 10 0.970       9
#> 11 0.958      10