# Dimensional analysis in statistics

Alex Reinhart – Updated August 4, 2022

When I was a physics major, I was taught dimensional analysis: physical quantities have units (like length or mass), and any formula combining physical quantities had to have correct units. You cannot, for instance, add mass to velocity: the units are incompatible.

In statistics, however, we routinely build models relating physical quantities with units, and do not think very carefully about the units at all. Thinking about units would help with interpretation (e.g. regression coefficients have units, and these help us interpret what the coefficient means), but may also suggest what model types are viable: models that do not preserve units cannot be correct, in some fundamental way. But I have seen very little exploration of units in statistics or what they might mean for modeling.

• Hogg, D. W. (2012). Data analysis recipes: Probability calculus for inference. https://arxiv.org/abs/1205.4446

The first mention of units in statistics that I found, early in my statistical career. Hogg points out a few things:

• Probability is dimensionless.
• A random variable has the dimensions of whatever quantity it represents (e.g. length).
• For a continuous random variable, probability is the integral of the density with respect to the random variable. This means the probability density has units of 1 over the units of the random variable: if you think of a Riemann integral, the density (1/units) and width (units) are multiplied together, resulting in something with no units.
• This can be useful in remembering things like Bayes’ theorem, since you know the units have to work out.
• Lee, Y. Y., Zidek, J. V., & Heckman, N. (2020). Dimensional analysis in statistical modelling. https://arxiv.org/abs/2002.11259

Somewhat of an overview of dimensional analysis in statistics, although the paper is a bit scattered. Provides several examples of where dimension matters in statistics, and gives a formal of invariance to dimension, a criterion for physical models that must apply regardless of the unit scale used for measurement. Most pointedly:

Further, we have shown that, surprisingly, not all functions are candidates for use in formulating relationship among attribute variables. Thus functions like g(x) = \ln(x) are transcendental and hence inadmissible for that role. This eliminates from consideration in relationships not only the natural logarithm but also, for example, the hyperbolic trigonometric functions. This knowledge would be useful to statistical scientists in developing statistical models.