Specify the distribution of a predictor variable

Predictor variables can have any marginal distribution as long as a function is provided to sample from the distribution. Multivariate distributions are also supported: if the random generation function returns multiple columns, multiple random variables will be created. If the columns are named, the random variables will be named accordingly; otherwise, they will be successively numbered.

Usage

predictor(dist, ...)

Arguments

dist: Function to generate draws from this predictor's distribution, provided as a function or as a string naming the function.
...: Additional arguments to pass to dist when generating draws.

Value

A predictor_dist object, to be used in population() to specify a population distribution

Details

The random generation function must take an argument named n specifying the number of draws. For univariate distributions, it should return a vector of length n; for multivariate distributions, it should return an array or matrix with n rows and a column per variable.

Multivariate predictors are successively numbered. For instance, if predictor X is specified with

library(mvtnorm)
predictor(dist = rmvnorm, mean = c(0, 1),
          sigma = matrix(c(1, 0.5, 0.5, 1), nrow = 2))

then the population predictors will be named X1 and X2, and will have covariance 0.5.

If the multivariate predictor has named columns, the names will be used instead. For instance, if predictor X generates a matrix with columns A and B, the population predictors will be named XA and XB.

Examples

# Univariate normal distribution
predictor(dist = rnorm, mean = 10, sd = 2.5)
#> rnorm(list(mean = 10, sd = 2.5))

# Multivariate normal distribution
library(mvtnorm)
predictor(dist = rmvnorm, mean = c(0, 1, 7))
#> rmvnorm(list(mean = c(0, 1, 7)))

# Multivariate with named columns
rmulti <- function(n) {
  cbind(treatment = rbinom(n, size = 1, prob = 0.5),
        confounder = rnorm(n)
  )
}
predictor(dist = rmulti)
#> rmulti()