Probabilistic programming

Alex Reinhart – Updated February 28, 2017 notebooks · refsmmat.com

Contrast with Statistical programming languages.

Warning: I do not know what I am talking about.

For the basic concepts, see The Design and Implementation of Probabilistic Programming Languages, by Noah D. Goodman and Andreas Stuhlmüller. However, I can’t find a good high-level overview (that is, well-suited to a statistician who already understands probabilistic models, not a computer scientist) anywhere.

A almost-but-not-quite-entirely-wrong description would be this: A probabilistic program generates outputs via some code, which may depend on random events (a coin flip, a draw from a distribution, etc.) dependent on some parameters. Naturally we can sample from the outputs by just repeatedly running the program; however, the number of possible execution paths through the program is exponential in the number of random choices it involves, so exploring the entire distribution quickly becomes intractable. (Continuous random variables make the space of execution paths infinite.) There are a variety of MCMC and other approaches to get an approximate distribution efficiently.

We can also work in reverse: condition on data. If we’ve naively enumerated all possible execution paths through the program, we can select out those that result in the observed data, and obtain the conditional distributions of all the random values we used in the program. (These could be parameters drawn from a prior distribution, say.) Hence we can perform inference on useful model parameters. Again, enumeration is ridiculously inefficient for most types of models, so research focuses on clever ways to avoid it and obtain approximate solutions.

A simple example from the Venture tutorial:

assume intercept = normal(0, 10);
assume slope = normal(0, 10)

assume line = proc(x) { slope * x + intercept }
assume obs = proc(x) { normal(line(x), 1) }

observe obs(1) = 2;
observe obs(2) = 2;

infer conditional;

sample list(intercept, slope)

So we specify priors on the slope and intercept, specify how the line and observations are computed, specify observations, and can then condition on those observations, and then sample from the slope and intercept conditional distributions. We could also draw samples from obs, presumably, by repeatedly sampling new slope and intercept values from their priors.

Languages

A good overview (from March 2015) is The State of Probabilistic Programming. There’s a huge variety of probabilistic programming languages; it seems many explore the tradeoffs between expressivity and efficiency, by either (a) focusing on expressing certain types of models and then having very efficient inference algorithms which exploit their structure, or (b) being more flexible but hence requiring more general (and less performant) inference algorithms.

Since there are a million different probabilistic languages in various states of maturity, I’ll just list whichever caught my eye here. A bigger list is available here.