Skip to contents

Sampling is split into two steps, for predictors and for response variables, to allow users to choose which to simulate. sample_x() will only sample predictor variables, and sample_y() will augment a data frame of predictors with columns for response variables, overwriting any already present. Hence one can use sample_y() as part of a simulation with fixed predictors, for instance.

Usage

sample_x(population, n)

sample_y(xs)

Arguments

population

Population, as defined by population().

n

Number of observations to draw from the population.

xs

Data frame of predictor values drawn from the population, as obtained from sample_x().

Value

Data frame (tibble) of n rows, with columns matching the variables specified in the population.

Examples

# A population with a simple linear relationship
pop <- population(
  x1 = predictor("rnorm", mean = 4, sd = 10),
  x2 = predictor("runif", min = 0, max = 10),
  y = response(0.7 + 2.2 * x1 - 0.2 * x2, error_scale = 1.0)
)

xs <- pop |>
  sample_x(5)

xs
#> Sample of 5 observations from
#> Population with variables:
#> x1: rnorm(list(mean = 4, sd = 10))
#> x2: runif(list(min = 0, max = 10))
#> y: gaussian(~0.7 + 2.2 * x1 - 0.2 * x2, error_scale = ~1)
#> 
#> # A tibble: 5 × 2
#>       x1    x2
#> *  <dbl> <dbl>
#> 1   2.67  6.12
#> 2  11.2   8.42
#> 3  11.8   2.56
#> 4  -6.68  7.14
#> 5 -20.2   7.85

xs |>
  sample_y()
#> Sample of 5 observations from
#> Population with variables:
#> x1: rnorm(list(mean = 4, sd = 10))
#> x2: runif(list(min = 0, max = 10))
#> y: gaussian(~0.7 + 2.2 * x1 - 0.2 * x2, error_scale = ~1)
#> 
#> # A tibble: 5 × 3
#>       x1    x2      y
#>    <dbl> <dbl>  <dbl>
#> 1   2.67  6.12   3.66
#> 2  11.2   8.42  23.1 
#> 3  11.8   2.56  26.2 
#> 4  -6.68  7.14 -17.7 
#> 5 -20.2   7.85 -45.4