Specify a response variable in terms of predictors

Response variables are related to predictors (and other response variables) through a link function and response distribution. First the expression provided is evaluated using the predictors, to give this response variable's value on the link scale; then the inverse link function and response distribution are used to get the response value. See Details for more information.

Usage

response(expr, family = gaussian(), error_scale = NULL, size = 1L)

Arguments

expr: An expression, in terms of other predictor or response variables, giving this predictor's value on the link scale.
family: The family of this response variable, e.g. gaussian() for an ordinary Gaussian linear relationship.
error_scale: Scale factor for errors. Used only for linear families, such as gaussian() and ols_with_error(). Errors drawn while simulating the response variable will be multiplied by this scale factor. The scale factor can be a scalar value (such as a fixed standard deviation), or an expression in terms of the predictors, which will be evaluated when simulating response data. For generalized linear models, leave as NULL.
size: When the family is binomial(), this is the number of trials for each observation. Defaults to 1, as in logistic regression. May be specified either as a vector of the same length as the number of observations or as a scalar. May be written terms of other predictor or response variables. For other families, size is ignored.

Value

A response_dist object, to be used in population() to specify a population distribution

Details

Response variables are drawn based on a typical generalized linear model setup. Let $Y$ represent the response variable and $X$ represent the predictor variables. We specify that

$$Y \mid X \sim \text{SomeDistribution},$$

where

$$\mathbb{E}[Y \mid X = x] = g^{-1}(\mu(x)).$$

Here $\mu(X)$ is the expression expr, and both the distribution and link function $g$ are specified by the family provided. For instance, if the family is gaussian(), the distribution is Normal and the link is the identity function; if the family is binomial(), the distribution is binomial and the link is (by default) the logistic link.

Response families

The following response families are supported.

gaussian()

The default family is gaussian() with the identity link function, specifying the relationship

$$Y \mid X \sim \text{Normal}(\mu(X), \sigma^2),$$

where $\sigma^2$ is given by error_scale.

ols_with_error()

Allows specification of custom non-Normal error distributions, specifying the relationship

$$Y = \mu(X) + e,$$

where $e$ is drawn from an arbitrary distribution, specified by the error argument to ols_with_error().

binomial()

Binomial responses include binary responses (as in logistic regression) and responses giving a total number of successes out of a number of trials. The response has distribution

$$Y \mid X \sim \text{Binomial}(N, g^{-1}(\mu(X))),$$

where $N$ is set by the size argument and $g$ is the link function. The default link is the logistic link, and others can be chosen with the link argument to binomial(). The default $N$ is 1, representing a binary outcome.

poisson()

Poisson-distributed responses with distribution

$$Y \mid X \sim \text{Poisson}(g^{-1}(\mu(X))),$$

where $g$ is the link function. The default link is the log link, and others can be chosen with the link argument to poisson().

custom_family()

Responses drawn from an arbitrary distribution with arbitrary link function, i.e.

$$Y \mid X \sim \text{SomeDistribution}(g^{-1}(\mu(X))),$$

where both $g$ and SomeDistribution are specified by arguments to custom_family().

Evaluation and scoping

The expr, error_scale, and size arguments are evaluated only when simulating data for this response variable. They are evaluated in an environment with access to the predictor variables and the preceding response variables, which they can refer to by name. Additionally, these arguments can refer to variables in scope when the enclosing population() was defined. See the Examples below.

Examples

# Defining a binomial response. The expressions can refer to other predictors
# and to the environment where the `population()` is defined:
slope1 <- 2.5
slope2 <- -3
intercept <- -4.6
size <- 10
population(
  x1 = predictor(rnorm),
  x2 = predictor(rnorm),
  y = response(intercept + slope1 * x1 + slope2 * x2,
               family = binomial(), size = size)
)
#> Population with variables:
#> x1: rnorm()
#> x2: rnorm()
#> y: binomial(intercept + slope1 * x1 + slope2 * x2, size = size)