See also Philosophy of statistics.
Peter Godfrey-Smith (2003), Theory and Reality: An Introduction to the Philosophy of Science.
A good historical overview of the philosophy of science.
Platt, J. R. (1964). Strong inference. Science, 146(3642), 347–353. doi:10.1126/science.146.3642.347
Platt’s core point is pretty straightforward:
Strong inference consists of applying the following steps to every problem in science, formally and explicitly and regularly:
- Devising alternative hypotheses;
- Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly as possible, exclude one or more of the hypotheses;
- Carrying out the experiment so as to get a clean result;
- Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain; and so on.
He then claims that “For exploring the unknown, there is no faster method; this is the minimum sequence of steps” and argues that the fastest-moving scientific fields are those closest to this ideal. Proposes the “method of multiple hypotheses” (from Chamberlin), whereby every scientist comes up with more than one hypothesis at a time and pits them against each other, instead of wallowing around with a single hypothesis.
O’Donohue, W., & Buchanan, J. A. (2001). The weaknesses of strong inference. Behavior and Philosophy, 29, 1–20. https://www.jstor.org/stable/27759412
Makes several objections to Platt’s method, such as:
Platt omits the problem of formulating the scientific problem; a problem like “What causes depression?” is sufficiently vague and ill-defined that building and testing alternative hypotheses is nearly impossible, and one needs research just to establish a better question to ask.
There are infinitely many possible alternative hypotheses, and one could never eliminate them all. “Plausible” hypotheses are fewer, but plausibility is hard to define and we may be wrong; I don’t find this concerning, because all we need is a good hypothesis at the end, not the Truth. Perhaps there exists some marginally better hypothesis somewhere, but science is a process of incremental improvement, not absolute truth.
The Quine-Duhem problem: no hypothesis can ever be logically ruled out because experiments depend on auxiliary hypotheses, which can be rejected first. The same argument is made in Meehl’s “Theoretical risks and tabular asterisks” (see Paul Meehl and psychology). In some fields experimental methods are trusted enough to provide crucial tests; in others, a crucial experiment would just mean endless debate about the inadequacy of the experiment.
The interpretation of a crucial experiment may depend on other theories which turn out to be wrong. Cites Feyerabend’s example of heliocentrism, which was “refuted” by dropping a stone off of a tower; the stone landed at the base of the tower, proving the Earth could not be moving. The development of Galilean relativity changed the interpretation of this experiment.
A bunch of nitpicking about whether Platt’s examples of fields using strong inference actually use strong inference. See Davis (2006), below, for a very simple rebuttal to this point.
Davis, R. H. (2006). Strong inference: Rationale or inspiration? Perspectives in Biology and Medicine, 49(2), 238–250. doi:10.1353/pbm.2006.0022
Argues that “both Platt’s critics and those who embraced his views took his article far too seriously” because it was “an inspirational tract, the strengths of which were largely rhetorical”. Discusses Platt’s effect on various scientific fields.
Greenwald, A. G. (2012). There is nothing so theoretical as a good method. Perspectives on Psychological Science, 7(2), 99–108. doi:10.1177/1745691611434210
Not sure what to make of this one. Points out that many theory controversies in psychology have gone decades without being resolved, despite Platt’s notion of strong inference providing direct resolutions. Blames much of this on confirmation bias preventing psychologists from accepting contradictory evidence. But then talks about how method and theory go together, as evidenced by Nobel Prizes suggesting methods are advanced by theory and vice versa. Not sure how this connects or what the overall point is; seems surprisingly sanguine about the idea that psychological controversies are so slow to resolve.
Seems to miscontrue Chamberlin’s “method of multiple hypotheses” as requiring a researcher to adopt competing hypotheses from other scientists, instead of always generating multiple hypotheses in their own work. This allows Greenwald to dismiss the method as impractical, since nobody would “regard a competitor’s theory with something approaching their affection for a beloved adoptee.”
Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour. doi:10.1038/s41562-018-0522-1
An argument for the adoption of strong mathematical theory in psychology, quoting Poincaré’s quip that “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more science than a heap of stones is a house.” The methodological improvements arising from the reproducibility crisis, such as study preregistration and improved statistical methods, “will only help unsure solid stones; they don’t help us build the house.” The idea is that psychology has been a largely disconnected collection of results about things people do in different situations, but these results tell us very little about what to expect in new situations or with different people – they don’t help us make any testable predictions. In contrast, dual inheritance theory is offered as an example of a mathematically formulated theory from explaining human behavior in evolutionary terms and making specific mathematically testable predictions about e.g. conformity and learning. Interpreted from a statistician’s perspective: the whole point of science is to generalize results from one situation or population to another – that’s what makes an explanation an explanation – and an endless parade of hypotheses of the form “Group A has a different mean from Group B” will, even if tested with the best possible procedures, never help us achieve this goal. The model must be richer and permit deduction in new situations.