Sampling is split into two steps, for predictors and for response variables,
to allow users to choose which to simulate. sample_x()
will only sample
predictor variables, and sample_y()
will augment a data frame of predictors
with columns for response variables, overwriting any already present. Hence
one can use sample_y()
as part of a simulation with fixed predictors, for
instance.
Arguments
- population
Population, as defined by
population()
.- n
Number of observations to draw from the population.
- xs
Data frame of predictor values drawn from the population, as obtained from
sample_x()
.
Value
Data frame (tibble) of n
rows, with columns matching the variables
specified in the population.
Examples
# A population with a simple linear relationship
pop <- population(
x1 = predictor(rnorm, mean = 4, sd = 10),
x2 = predictor(runif, min = 0, max = 10),
y = response(0.7 + 2.2 * x1 - 0.2 * x2, error_scale = 1.0)
)
xs <- pop |>
sample_x(5)
xs
#> Sample of 5 observations from
#> Population with variables:
#> x1: rnorm(list(mean = 4, sd = 10))
#> x2: runif(list(min = 0, max = 10))
#> y: gaussian(0.7 + 2.2 * x1 - 0.2 * x2, error_scale = 1)
#>
#> # A tibble: 5 × 2
#> x1 x2
#> * <dbl> <dbl>
#> 1 -5.62 2.78
#> 2 4.52 3.64
#> 3 13.0 0.699
#> 4 0.0510 5.89
#> 5 13.5 0.841
xs |>
sample_y()
#> Sample of 5 observations from
#> Population with variables:
#> x1: rnorm(list(mean = 4, sd = 10))
#> x2: runif(list(min = 0, max = 10))
#> y: gaussian(0.7 + 2.2 * x1 - 0.2 * x2, error_scale = 1)
#>
#> # A tibble: 5 × 3
#> x1 x2 y
#> <dbl> <dbl> <dbl>
#> 1 -5.62 2.78 -12.8
#> 2 4.52 3.64 11.3
#> 3 13.0 0.699 30.9
#> 4 0.0510 5.89 -0.540
#> 5 13.5 0.841 31.3