Why do Frequency Formats Improve Bayesian Reasoning? Cognitive Algorithms Work on Information, Which Needs Representation
View Dublin Core Metadata for this article external link

Gigerenzer, Gerd


Please note:
This paper is a preprint of an article published in Behavioral and Brain Sciences, 19, 1 (1996), 23-24, therefore there may be minor differences between the two versions.
The copyright of this electronic version remains with the author and the Max Planck Institute for Human Development.



Cognitive Algorithms Work On Information, Which Needs Representation

Some years ago, I applied for a green card. The U.S. immigration office demanded an HIV test, and I was told that a positive test (actually, a positive test confirmed by a second one) would result in denial of the green card. The morning I drove to the U.S. consulate in Frankfurt to take the test, I asked myself what the probability of having the virus is if the test comes out positive. At this time I had the following information. About .2% of German men have the HIV virus (base rate). If someone has the virus, there is a 99% chance that the test will be positive (hit rate). If someone does not have the virus, there is still a 5% chance that the test will be positive (false alarm rate). Question: What is the probability of having the virus if the test comes out positive?

One way to answer this question is to take pencil and paper and to compute the posterior probability with Bayes' theorem - but I was driving. A less cumbersome method is to change the representation of the information from probabilities (percentages) into absolute frequencies (natural numbers), which can be done while driving. Imagine 1,000 men. Two have the virus (base rate), and these two will most likely test positive (hit rate). Out of the 998 who do not have the virus, some 50 will also test positive (false alarm rate). So we have 52 who test positive. Question: How many of those who test positive actually have the virus?

With this frequency representation, one does not need pencil and paper or a calculator. The answer can be immediately be "seen." About 2 out of 52 men who test positive have the virus. This figure corresponds to a .04 posterior probability. Note that the information representations, probability and frequency, are mathematically equivalent: they can be mapped on to each other in a one-to-one fashion. But what is equivalent for mathematics may not be equivalent for the mind. Most people have no idea what to do with this information represented as probabilities or percentages; many, however, show insight when the information comes in natural numbers (for details see Gigerenzer & Hoffrage, 1995).

Why do frequency formats improve Bayesian reasoning without instruction? Bayesian computations are simpler with frequency formats than with probabilities or percentages. Let the symbols H and -H stand for the two hypotheses (virus or not), and D for the potential data (positive HIV test). A Bayesian algorithm for computing the posterior probability p(H|D) with the values given in the probability version amounts to solving the following equation:


ggwdfbabs01.gif (771 Byte)


In contrast, a Bayesian algorithm for computing the posterior probability p(H|D) in the frequency version requires solving the following equation:


ggwdfbabs02.gif (402 Byte)


where a is the number of cases with symptom and disease, and b is the number of cases having the symptom but lacking the disease. Thus, when information is presented in natural numbers, the Bayesian computations are much simpler than with probabilities or percentages.

The general point is that algorithms work on information, and information needs representation. Thus, cognitive algorithms need to be studied in tandem with the external representations of information on which they operate (Marr, 1982). This psychological point has been overlooked in most of the research on the so-called base rate fallacy, where the external representation of information has been a matter of convention, not theory. The link between algorithm and external representation is just as important for an electronic calculator. My pocket calculator has an algorithm for multiplication, which is designed for Arabic numbers as input. If I enter binary numbers, garbage comes out. But I cannot conclude from the garbage that my calculator has no algorithm for multiplication. Similarly, one cannot conclude that people who come up with wild posterior probabilities do not have a Bayesian algorithm in their mind, as has been done in previous research on base-rate neglect.

What happens to people's reasoning when researchers use frequency rather than probability representations for problems used in previous research, such as the Cab problem and the Mammography problem? In one of the largest studies ever done on Bayesian inference, we found that with frequency representations, subjects arrived at the numerically exact estimate using a Bayesian algorithm (including pictorial equivalents and shortcuts) in about 50% of the cases (Gigerenzer & Hoffrage, 1995; see also Cosmides & Tooby, in press).

Jonathan Koehler has rightly pointed out that research on statistical thinking needs to move towards what he calls an ecologically valid research program, and the above analysis of the role of external representation in reasoning elaborates this vision. A first step towards ecologically-minded research is to think about how information was represented and encountered by humans during most of our history, and to look for cognitive algorithms that are tuned to those representations. From such ecological considerations, we might expect cognitive algorithms to be designed for absolute frequencies, rather than for probabilities and percentages which depend on the development of literacy and numeracy (Gigerenzer et al., 1989). An important consequence is that base rates need not be attended to in natural sampling of frequencies (Kleiter, 1994). This can be seen from Equation 2, in which the base rates are already embodied in the two absolute frequencies. The only information that needs to be monitored are these two frequencies, for example, the number of cases with symptom and disease and the number of cases with symptom and no disease.

Representation of information in terms of frequencies improves Bayesian inferences without instruction. More generally, frequency representations make "cognitive illusions" in statistical reasoning largely disappear (for an overview see Gigerenzer, 1991, 1994). These results are of course good news for those who would like to believe in some sort of human rationality, for those biologically-minded people who wonder how a species so bad at judgment under uncertainty could have survived so long, and also for those unfortunate souls charged with teaching undergraduate statistics.

Koehler's article on the base-rate fallacy is timely. Research on the use of base rates has been driven by the formal theory of probability, which is mute about representation and content (but see how Birnbaum (1983) tailored statistical models to the content of the Cab problem). It has also ignored the problems that arise when Bayes' theorem is applied to everyday life (e.g., Daston, 1988; Earman, 1992). For instance, while driving to the US consulate for my HIV test, I wondered from which reference class to take the base rate, since I am a member of several reference classes with different base rates. Selecting a reference class introduces a source of subjectivity that statisticians such as R. A. Fisher (1935) have held against the routine application of Bayes' theorem.

Ironically, most researchers who hold subjects to Bayesian standards do not themselves adhere to those standards in their research. Experimenters habitually use Fisher's significance testing, not Bayes' theorem, to infer whether subjects reason the Bayesian way. Thus they accuse subjects of committing the base-rate fallacy at the same time that they neglect base rates. This is a double standard which should make us suspicious of the mechanical application of norms, Bayesian or otherwise, to evaluating human inference.







Cosmides, L., & Tooby, J. (in press). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition.

Birnbaum, M. H. (1983). Base rates in Bayesian inference: signal detection analysis of the cab problem. American Journal of Psychology, 96, 85-94.

Daston, L. J. (1988). Classical probability in the Enlightenment. Princeton, NJ: Princeton University Press.

Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory. Cambridge, MA: MIT Press.

Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver and Boyd.

Gigerenzer, G. (1991). How to make cognitive illusions disappear. Beyond heuristics and biases. In W. Stroebe & M. Hewstone (eds.), European Review of Social Psychology, Vol. 2, 83-115.

Gigerenzer, G. (1994). Why the distinction between single-event probabilities and frequencies is relevant for psychology (and vice versa). In G. Wright & P. Ayton (Eds.), Subjective Probability (pp. 129-161). New York: Wiley.

Gigerenzer, G. & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102 (No. 4).

Gigerenzer, G. & Murray, D.J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L. J., Beatty, J. & Krüger, L. (1989). The empire of chance. How probability changed science and everyday life. Cambridge: Cambridge University Press.

Kleiter, G. D. (1994). Natural sampling: Rationality without base rates. In G. H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology, psychometrics, and methodology (pp. 375-388). New York: Springer.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman.

  Contact Author        
    » Home   » The Institute   » Electronic Full Texts   
  Update 7/2001   » webmaster-library(at)mpib-berlin.mpg.de
» ©Copyright