Predictor variables can have any marginal distribution as long as a function is provided to sample from the distribution. Multivariate distributions are also supported: if the random generation function returns multiple columns, multiple random variables will be created. If the columns are named, the random variables will be named accordingly; otherwise, they will be successively numbered.
Value
A predictor_dist object, to be used in population() to specify a
population distribution
Details
The random generation function must take an argument named n specifying the
number of draws. For univariate distributions, it should return a vector of
length n; for multivariate distributions, it should return an array or
matrix with n rows and a column per variable.
Multivariate predictors are successively numbered. For instance, if predictor
X is specified with
library(mvtnorm)
predictor(dist = rmvnorm, mean = c(0, 1),
sigma = matrix(c(1, 0.5, 0.5, 1), nrow = 2))then the population predictors will be named X1 and X2, and will have
covariance 0.5.
If the multivariate predictor has named columns, the names will be used
instead. For instance, if predictor X generates a matrix with columns A
and B, the population predictors will be named XA and XB.
Examples
# Univariate normal distribution
predictor(dist = rnorm, mean = 10, sd = 2.5)
#> rnorm(list(mean = 10, sd = 2.5))
# Multivariate normal distribution
library(mvtnorm)
predictor(dist = rmvnorm, mean = c(0, 1, 7))
#> rmvnorm(list(mean = c(0, 1, 7)))
# Multivariate with named columns
rmulti <- function(n) {
cbind(treatment = rbinom(n, size = 1, prob = 0.5),
confounder = rnorm(n)
)
}
predictor(dist = rmulti)
#> rmulti()