Dimensional analysis in statistics

Alex Reinhart – Updated February 14, 2024 notebooks · refsmmat.com

When I was a physics major, I was taught dimensional analysis: physical quantities have units (like length or mass), and any formula combining physical quantities had to have correct units. You cannot, for instance, add mass to velocity: the units are incompatible.

In statistics, however, we routinely build models relating physical quantities with units, and do not think very carefully about the units at all. Thinking about units would help with interpretation (e.g. regression coefficients have units, and these help us interpret what the coefficient means), but may also suggest what model types are viable: models that do not preserve units cannot be correct, in some fundamental way. But I have seen very little exploration of units in statistics or what they might mean for modeling.

Hogg, D. W. (2012). Data analysis recipes: Probability calculus for inference. https://arxiv.org/abs/1205.4446

The first mention of units in statistics that I found, early in my statistical career. Hogg points out a few things:
- Probability is dimensionless.
- A random variable has the dimensions of whatever quantity it represents (e.g. length).
- For a continuous random variable, probability is the integral of the density with respect to the random variable. This means the probability density has units of 1 over the units of the random variable: if you think of a Riemann integral, the density (1/units) and width (units) are multiplied together, resulting in something with no units.
- This can be useful in remembering things like Bayes’ theorem, since you know the units have to work out.
Shen, W., & Lin, D. K. J. (2019). Statistical theories for dimensional analysis. Statistica Sinica, 29, 527–550. doi:10.5705/ss.202015.0377
A systematic explanation of dimensional analysis in statistics, with more formal definitions of the criteria you might use to define a dimensionally correct model (e.g. invariance to certain transformations). Shows (in section 4) how dimensional analysis might be applied to select variables in a statistical problem.
Lee, Y. Y., Zidek, J. V., & Heckman, N. (2020). Dimensional analysis in statistical modelling. https://arxiv.org/abs/2002.11259

Somewhat of an overview of dimensional analysis in statistics, although the paper is a bit scattered. Provides several examples of where dimension matters in statistics, and gives a formal of invariance to dimension, a criterion for physical models that must apply regardless of the unit scale used for measurement. Most pointedly:

Further, we have shown that, surprisingly, not all functions are candidates for use in formulating relationship among attribute variables. Thus functions like g(x) = \ln(x) are transcendental and hence inadmissible for that role. This eliminates from consideration in relationships not only the natural logarithm but also, for example, the hyperbolic trigonometric functions. This knowledge would be useful to statistical scientists in developing statistical models.