The Bounded Rationality of Probabilistic Mental Models
View Dublin Core Metadata for this article external link

Gigerenzer, Gerd

Author's Note
This chapter is based on a lecture delivered at Harvard University, October 2, 1991. I wrote this chapter under a fellowship at the Center for Interdisciplinary Research, University of Bielefeld, Germany, and with the support of the Fonds zur Förderung der wissenschaftlichen Forschung (P 8842-MED), Austria. I am grateful to Lorraine Daston, Ralph Hertwig and Ulrich Hoffrage for many helpful comments.

 

Please note:
This paper is a preprint of an article published in Manktelow, K. I., & Over, D. E. (Eds.). (1993). Rationality: Psychological and philosophical perspectives (pp. 284-313). London: Routledge, therefore there may be minor differences between the two versions.
The copyright of this electronic version remains with the author and the Max Planck Institute for Human Development.

   

 

Imagine you are a subject in a psychological experiment. In front of you is a text problem, and you begin to read:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations. Which of these two alternatives is more probable?

(a) Linda is a bank teller.

(b) Linda is a bank teller and is active in the feminist movement.

Which alternative would you choose? Assume you chose (b), just as most subjects in previous experiments did. The experimenter explains to you that (b) is the conjunction of two events, namely that Linda is a bank teller and is active in the feminist movement, whereas (a) is one of the constituents of the conjunction. Because the probability of a conjunction of two events cannot be greater than that of one of its constituents, the correct answer is (a), not (b), the experimenter says. Therefore, your judgment is recorded as an instance of a reasoning error, known as the conjunction fallacy. You may be inclined to admit that you have committed a reasoning error. The experimenter now explains that these reasoning errors are like visual illusions: once the error is pointed out, people like you show insight, but this knowledge does not necessarily help. People see the same illusion again, or continue to reason in the same way, despite insight. Therefore, in analogy to stable visual illusions, stable reasoning errors such as the conjunction fallacy have been labeled cognitive illusions.

Cognitive illusions, and their explanations, cognitive heuristics, are the stock-in-trade of a research program known as the heuristics-and-biases program (e.g. Tversky & Kahneman, 1974, 1983). Cognitive illusions "seem reliable, systematic and difficult to eliminate" (Kahneman & Tversky, 1972, p. 431).

Stable cognitive illusions are not the first assault on human rationality by psychologists. Sigmund Freud's attack on human rationality is probably the best-known: the unconscious desires and wishes of the Id are a steady source of intrapsychical conflict that may manifest itself in all kinds of irrational fears, beliefs, and behavior. But the cognitive-illusion assault is stronger than the psychoanalytic. It does not need to invoke unconscious wishes or desires to overwhelm human rationality. Cognitive illusions are seen as a straightforward consequence of the laws of human reasoning. Humans do not possess the proper mental algorithms.

Paleontologist Stephen J. Gould (1992), referring to the Linda Problem, puts the message clearly:

"Why do we consistently make this simple logical error? Tversky and Kahneman argue, correctly I think, that our minds are not built (for whatever reason) to work by the rules of probability" (p. 469).

The purpose of this chapter is to evaluate this claim and to provide an alternative. In the first part, I will draw the reader's attention to the fact that both proponents and opponents of rationality tend to focus on the same single psychological concept: algorithms in the mind. Second, I will go beyond algorithms by introducing conceptual distinctions drawn from philosophy, statistics, and cognitive science, and argue that these distinctions are not just the province of philosophers and statisticians but have quite tangible implications for understanding the cognitive processes in reasoning and for the rationality debate. Third, I demonstrate that these implications are so powerful that they can make apparently stable cognitive illusions disappear. Finally, I will present a model of bounded rationality, the theory of probabilistic mental models, as an alternative to traditional explanations in terms of the heuristics-and-biases program. Using the overconfidence effect as an illustration, I will show that this theory explains both the old data (cognitive illusions), predicts new phenomena, and provides a fresh look at what rational probabilistic reasoning means.

I. Rationality: What Kind of Mental Algorithm?

In his Movements of Animals, Aristotle described a practical syllogism as one that guides practical rationality:

"For example, when you conceive that every man ought to walk and you yourself are a man, you immediately walk; or if you conceive that on a particular occasion no man ought to walk, and you yourself are a man, you immediately remain at rest" (1945, p. 701a).

The foundation of present-day theories of rationality, however, was laid in the mid-seventeenth century with the classical theory of probability (Daston, 1988). In contrast to syllogisms, probability could deal with degrees of beliefs, weights of evidence, expectations, and other forms of uncertainty that are characteristic of everyday affairs - from weighting the evidence in a law court to calculating insurance premiums. During the Enlightenment, probability theory and rational reasoning came to be seen as two sides of the same coin; probability theory is "nothing more at bottom than good sense reduced to a calculus" (Laplace, 1814/1951, p. 196). For instance, in his famous treatise of 1854, the mathematician George Boole set out to demonstrate that the laws of logic, probability and algebra can in fact be derived from the laws of human reasoning.

"There is not only a close analogy between the operations of the mind in general reasoning and its operations in the particular science of Algebra, but there is to a considerable extent an exact agreement in the laws by which the two classes of operations are conducted" (1854/1958, p.6).

Bärbel Inhelder and Jean Piaget (1958) echoed this belief a century later: "Reasoning is nothing more than the propositional calculus itself" (p. 305).

According to these views, the laws of probability or logic are the algorithms of the mind, and they define rational reasoning as well. Defenders and detractors of human rationality alike have tended to focus on the issue of algorithms. Only their answers differ. Here are some prototypical arguments in the current debate.

 

Statistical Algorithms

For philosophers such as L. Jonathan Cohen (1986), the assumption that human intuition is rational is absolutely indispensable for legitimizing their own profession. If intuition were not rational, this would "seriously discredit the claims of intuition to provide other things being equal dependable foundations for inductive reasoning in analytical philosophy" (p. 150). Cohen (1983) assumes that statistical algorithms (Baconian and Pascalian probability) are in the mind, but distinguishes between not having a statistical rule and not applying such a rule (p. 511), that is, between competence and performance. Cohen's interpretation of cognitive illusions parallels J.J. Gibson's interpretation of visual illusions (Gigerenzer, 1991): illusions are attributed to non-realistic experiments using impoverished information, to experimenters acting as conjurers, and to other factors that mask the subjects' competence:

"... unless their judgment is clouded at the time by wishful thinking, forgetfulness, inattentiveness, low intelligence, immaturity, senility, or some other competence-inhibiting factor, all subjects reason correctly about probability: none are programmed to commit fallacies or indulge in illusions" (Cohen, 1982, p. 251).

Cohen does not claim, I think, that people carry around the collected works of Kolmogoroff, Fisher, and Neyman in their heads, and merely need to have their memories jogged, like the slave in Plato's Meno. But his claim implies that people do have at least those statistical algorithms in their competence that are sufficient to solve all reasoning problems studied in the heuristics-and-biases literature, including the Linda Problem.

The Enlightenment view that human reasoning is in part probability theory does not imply that humans make no mistakes in reasoning. Nobody would deny that, even Cohen. According to Boole (1854/1958), for instance, errors in reasoning "are due to the interference of other laws with those laws of which right reasoning is the product" (p. 409). The message of the heuristics-and-biases program, however, is stronger than reminding us that emotions, desires, and the like make us err in reasoning.

 

Non-statistical Algorithms: Heuristics.

Proponents of the heuristics-and-biases program seem to assume that the mind is not built to work by the rules of probability:

"In making predictions and judgments under uncertainty, people do not appear to follow the calculus of chance or the statistical theory of prediction. Instead, they rely on a limited number of heuristics which sometimes yield reasonable judgments and sometimes lead to severe and systematic errors." (Kahneman & Tversky, 1973, p. 237).

A few more quotations illustrate the claim that the mind lacks statistical algorithms and, therefore, rationality. "It appears that people lack the correct programs for many important judgmental tasks. ... we have not had the opportunity to evolve an intellect capable of dealing conceptually with uncertainty" (Slovic, Fischhoff & Lichtenstein, 1976, p. 174). "The biases of framing and overconfidence just presented suggest that individuals are generally affected by systematic deviations from rationality" (Bazerman & Neale, 1986, p. 317). "We know that our uneducated intuitions concerning even the simplest statistical phenomena are largely defective" (Piattelli-Palmarini, 1989, p. 9). The experimental demonstrations have "bleak implications for human rationality" (Nisbett & Borgida, 1975, p. 935), and "For anyone who would wish to view man as a reasonable intuitive statistician, such results are discouraging" (Kahneman & Tversky, 1972/1982, p. 46).

Cognitive illusions are explained by non-statistical algorithms, known as cognitive heuristics. For instance, the standard explanation for the conjunction fallacy in the Linda Problem is that the mind assesses the probability by calculating the similarity between the description of Linda and each of the alternatives, and chooses that alternative with the highest similarity. Because the description of Linda was constructed to be representative of an active feminist and the conjunction contains the term "feminist," people judge the conjunction more probable - so the explanation goes. Judging probability by similarity has been termed the representativeness heuristic. This heuristic was proposed (but only vaguely defined) in the early 1970s, but has not been further defined since then. It has not yet been linked to any of many existing theories of similarity, nor has it been spelled out how exactly similarity or representativeness is calculated.

 

Statistical and Non-statistical Heuristics

So far we have two research programs. Cohen assumes that statistical algorithms are in the competence of humans, and one should explain cognitive illusion by identifying performance inhibiting factors. Tversky and Kahneman assume that mental algorithms are non-statistical heuristics, which cause stable cognitive illusions. Proponents of a third position do not want to be forced to choose between statistical and non-statistical algorithms, but want to have them both. Fong and Nisbett (1991, p. 35) argue that people possess both rudimentary but abstract intuitive versions of statistical principles, such as the law of large numbers, and non-statistical heuristics such as representativeness. The basis for these conclusions are the results of training studies. For instance, the experimenters first teach the subject the law of large numbers or some other statistical principle, and subsequently also explain how to apply this principle to a real-world domain such as sports problems. Subjects are then tested on similar problems from the same or other domains. The typical result is that more subjects reason statistically, but transfer to domains not trained in is often low. Evans (1984) has proposed a similar interpretation of deductive reasoning, assuming both a mental logic and non-logical heuristics.

To summarize: I have briefly sketched three positions in the present debate on the rationality of probability judgment. My point is that the discussion between these three positions focuses on the kind of mental algorithm - is it probability, heuristics, or both? I now invite you to look beyond algorithms, to different questions and new kinds of experiments. Let me start with three ideas and distinctions.

 

II. There is More than Mental Algorithms

The Distinction Between Algorithms and Information Representation

Information needs representation. In order to communicate information, it has to be represented in some symbol system (Marr, 1982). Take numerical information. This information can be represented by the Arabic numeral system, by the binary system, by Roman numbers, or other systems. These different representations can be mapped in a one-to-one way, and are in this sense equivalent representations. But they are not necessarily equivalent for an algorithm. Pocket calculators, for instance, generally work on the Arabic base 10 system, whereas general purpose computers work on the base 2 system. The numerals 100000 and 32 are representations of the number thirty-two in the binary and Arabic system, respectively. The algorithms of my pocket calculator will perform badly with the first kind of representation but work well on the latter.

The human mind finds itself in an analogous situation. The algorithms most Western people have stored in their minds - such as how to add, subtract, or multiply - work well on Arabic numerals. Contemplate division in Roman numerals for a moment.

There is more to the distinction between an algorithm and a representation of information. Not only are algorithms tuned to particular representations, but different representations make explicit different features of the same information. For instance, one can quickly see whether a number is a power of 10 in an Arabic numeral representation, whereas, to see whether that number is a power of 2 is more difficult. The converse holds with binary numbers. Finally, algorithms are tailored to given representations. Some representations allow for simpler and faster algorithms than others. Binary representation, for instance, is better suited to electronic techniques than Arabic representation. Arabic numerals, on the other hand, are better suited to multiplication and elaborate mathematical algorithms than Roman numerals - possibly one of the reasons for the superior development of mathematics in the early Arabic cultures as opposed to Roman culture.

The distinction between algorithms and information representation is central to David Marr's (1982) analysis of visual information processing systems. From vision to reasoning, I argue, understanding of cognitive processes needs to take account of both algorithms and information representation. I now connect this distinction with another conceptual distinction prominent in philosophy and probability theory.

 

The Distinction Between Subjective Degrees of Belief and Objective Frequencies

The classical probabilists of the Enlightenment slid with breathtaking ease and little justification from one sense of probability to another: from objective frequencies to physical symmetry (today referred to as "propensity") to subjective degrees of belief. Lorraine Daston (1988) has argued that this ease was a consequence of the associationist psychology of these days, of the belief, advanced inter alia by John Locke and David Hartley, that the matching of objective frequencies to subjective belief was rational. Only when associationist psychology shifted its emphasis to illusions and distortions introduced by passion and prejudice did the gap between objective and subjective probabilities become evident. Philosophers and mathematicians now drew a bold line between the first two objective meanings on the one hand and subjective probabilities on the other. The unity of belief and frequency crumbled in the first half of the nineteenth century. After the fall of the classical interpretation of probability the frequency interpretation emerged as the dominant view in the nineteenth and twentieth centuries (Gigerenzer et al., 1989).

For proponents of the frequency view such as Richard von Mises (1928/1957) and Jerzy Neyman (1977), probability theory is about frequencies, and does not deal with degrees of belief in single events. In the subjective ("Bayesian") interpretation that reemerged in this century, however, degrees of belief are what probability means. Others wanted to have it both ways, or have proposed alternative interpretations of probability. The question, What is probability about? is still with us. [1]

My intention here is not to take sides in this debate, but to liken the conceptual distinction between single-event probabilities and frequencies to the concept of information representation. This leads us to distinguish two kinds of representations: frequency information or single-event probabilities. Finer distinctions can be made, but this will suffice for a start.

 

Monitoring of Event Frequencies

The third idea is an evolutionary speculation that links with the above distinctions. Bumblebees, birds, rats, and ants all seem to be good intuitive statisticians, highly sensitive to changes in frequency distributions in their environments, as recent research in foraging behavior indicates (Gallistel, 1990; Real & Caraco, 1986). From sea snails to humans, as John Staddon (1988) argued, the learning mechanisms responsible for habituation, sensitization, and classical and operant conditioning can be described in terms of statistical inference machines. Reading this literature, one wonders why humans seem to do so badly in experiments on statistical reasoning.

Assume that some capacity or algorithm for statistical reasoning has been built up through evolution by natural selection. What information representation would such an algorithm be tuned to? Certainly not percentages and single-event probabilities (as in the typical experiments on human reasoning), since these took millennia of literacy and numeracy to evolve as tools for communication. Rather, in an illiterate world, the input representation would be frequencies of events, sequentially encoded, such as 3 out of 20 (as opposed to 15% or p = .15). Such a representation is couched in terms of discrete cases. Moreover, frequencies such as 3 out of 20 contain more information than percentages such as 15%. These frequencies contain information about the sample size (here: 20), which allows one to compute the ambiguity or precision of the estimate.

The notion that the mind infers the structure of the world through monitoring event frequencies is an old one. Locke and Hartley assumed that the mind is a kind of counting machine that automatically registered frequencies of past events, an assumption that is now called automatic frequency processing (Hasher & Zacks, 1979). David Hume (1739/1975) thought the mind was very sensitive to small differences in frequency: "When the chances or experiments on one side amount to ten thousand, and on the other to ten thousand and one, the judgment gives the preference to the latter, upon account of the superiority" (p. 141).

Now we can put these three ideas together. First, to analyze probabilistic reasoning, information representation and algorithms have to be distinguished. Second, there are (at least) two kinds of representations, frequencies and single-event probabilities. Finally, if evolution has selected some kind of algorithm in the mind, then it will be tuned to frequencies as representation.

In the next section I will show that these ideas, still rather general, are powerful enough to make several apparently stable cognitive illusions disappear.

 

III. How to Make Cognitive Illusions Disappear

Cognitive illusions have become a hard currency in many debates. When Stephen Stich (1990) argued against Donald Davidson's philosophy of language and Daniel Dennett's philosophy of mind, he pointed out that these two systems are inconsistent with the psychologists' "evidence for extensive irrationality in human inference" (p. 11). When I discuss with colleagues the actual evidence underlying such claims, the conjunction fallacy is often thrown in as the truly convincing and replicable demonstration of irrational reasoning.

So let us first see what the distinction between algorithm and information representation, and between frequency and single-event format, does to this cognitive illusion.

 

Conjunction Fallacy

Tversky and Kahneman (1983) reported that 85% of 142 undergraduates indicated that the conjunction "Linda is a bank teller and is active in the feminist movement" (T&F) is more probable than "Linda is a bank teller" (T). They and others have shown that this judgment is replicable and stable - not only with statistically naive undergraduates but with "highly sophisticated respondents" such as doctoral students in the decision science program of the Stanford Business School who had taken advanced courses in probability, statistics, and decision theory (Tversky & Kahneman, 1983, p. 298).

The conjunction fallacy and the conclusion that the mind is not programmed by the laws of probability but by non-statistical heuristics (albeit only very loosely defined ones) has become the accepted wisdom in much of cognitive and social psychology, philosophy of mind, and beyond. The conjunction fallacy has been proposed as the cause of various kinds of human misfortune, such as US security policy, where "the conjunction fallacy... lends ... plausibility to highly detailed nuclear war-fighting scenarios" (Kanwisher, 1989, p. 671).

Stephen J. Gould (1992), explaining the Linda Problem to his audience, writes:

[Tversky and Kahneman's] "studies have provided our finest insight into 'natural reasoning' and its curious departure from logical truth... I am particularly fond of [the Linda] example, because I know that the [conjunction] is least probable, yet a little homunculus in my head continues to jump up and down, shouting at me - 'but she can't just be a bank teller; read the description'" (p. 469).

Gould should have trusted his homunculus. In what follows, I will discuss the claim that the judgment called "conjunction fallacy" is an error in probabilistic reasoning. I will argue that this claim is not tenable, and Gould's homunculus will be vindicated. Thereafter I will show what the distinction between algorithm and information representation can do to the conjunction fallacy.

 

Cognitive Illusion Illusory?

Is the conjunction fallacy a violation of probability theory? Has a person who chooses "bank teller and active in the feminist movement" (T&F) violated probability theory? The answer is no, if the person is a frequentist such as Richard von Mises or Jerzy Neyman; yes, if she is a subjectivist such as Bruno de Finetti, and open otherwise.

The mathematician Richard von Mises, one of the founders of the frequency interpretation, used the following example to make his point:

"We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase 'probability of death', when it refers to a single person, has no meaning at all for us. This is one of the most important consequences of our definition of probability" (von Mises, 1928/1957, p. 11).

In this frequentist view, one cannot speak of a probability unless a reference class has been defined. The relative frequency of an event such as death is only defined with respect to a reference class, such as "all female pub-owners fifty-years old living in Bavaria." Relative frequencies may vary from reference class (pub owners) to reference class (HIV positives). Since a single person is always a member of many reference classes, no unique relative frequency can be assigned to a single person. As the frequentist statistician G.A. Barnard (1979) put it, if one wants to evaluate subjective probabilities of single events, one "should concentrate on the works of Freud and perhaps Jung rather than Fisher and Neyman" (p. 171). Thus, for a strict frequentist, the laws of probability are about frequencies and not about single events such as whether Linda is a bank teller. Therefore, in this view, no judgment about single event can violate probability theory.

From the frequency point of view, the laws of probability are mute on the Linda Problem, and what has been called a conjunction fallacy is not an error in probabilistic reasoning - probability theory simply doesn't apply to such cases. Seen from the Bayesian point of view, the conjunction fallacy is an error. Note that the experimental subjects were neither told that the Linda problem is meant to be a Bayesian probability textbook problem, nor did the experimenter try to persuade and commit their subjects to the Bayesian view.

How shall we evaluate this situation? The frequency view has been dominant since the nineteenth century, and teaching in statistics departments today as well as in undergraduate psychology courses is still predominantly frequentist in philosophy. Therefore, we cannot expect psychology undergraduates to carry around a Bayesian Superego in their minds. One should be careful not to evaluate reasoning against some norm, unless subjects have been committed to that particular norm. Thus, choosing T&F in the Linda problem is not a reasoning error. What has been labeled the "conjunction fallacy" here does not violate the laws of probability. It only looks so from one interpretation of probability.

 

How to Make the Conjunction Fallacy Disappear

We apply now the distinction between single-event and frequency information representation to the Linda Problem. We just change the format from single event to a frequency representation (see italicized passage), leaving everything else as it was.

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

There are 100 people who fit the description above. How many of them are:

(a) bank tellers

(b) bank tellers and active in the feminist movement

Subjects are now asked for frequency judgments rather than for the probability of a single event. If one focuses on mental algorithms alone, this change appears irrelevant. Moreover, if the mind solves the Linda Problem by using a representativeness heuristic, changes in representation should not matter, because they do not change the degree of similarity. The description of Linda is still more representative of (or similar to) the conjunction "teller and feminist" than of "teller." Subjects therefore should still exhibit the conjunction fallacy. Similarly, if one assumes with Cohen that the laws of probability are in the mind, but that subjects have been misled by the experimenter into bad performance, changes in representation should not matter either. For instance, subjects may have been misled by assuming that the description of Linda is of any relevance to the solution, whereas it is completely irrelevant to finding the solution. This irrelevancy argument is not altered by the frequency format. [2]

However, if there is some statistical algorithm in the mind that is tuned to frequencies as information representation, then something striking should happen to this stable cognitive illusion. Violations of the conjunction rule should largely disappear.

The experimental evidence available confirms this prediction. Klaus Fiedler (1988) reported that the number of conjunction violations in the Linda problem dropped from 91% in the original, single-event representation to 22% in the frequency representation (n = 44). The same result was found, when he replaced "There are 100 people" by some odd number such as "There are 168 people." The drop in the number of conjunction violations here was from 83% to 17% (n = 23). Hertwig and Gigerenzer (1993) used three alternatives: F (Linda is active in the feminist movement), T&F, and T. In the single-event task, subjects rank-ordered F, T&F and T with respect to their probability; in the frequency task, they estimated the frequency of T, T&F and F ("How many out of 200?"). The percentage of conjunction violations dropped from 88% (n=24) in the single-event task to 18% (n=49) in the frequency task. Fiedler reported similar results for other standard problems from which the conjunction fallacy has been inferred as a stable cognitive illusion. Tversky and Kahneman (1983, pp. 308-309) have reported a similar phenomenon in their original paper.

To summarize: The debate between Cohen and Tversky & Kahneman has centered on the question of algorithm. I have argued that in order to understand probabilistic reasoning, one should distinguish between algorithms and information representation. The philosophical and statistical distinction between single events and frequencies clarifies that judgments hitherto labeled instances of the "conjunction fallacy" cannot be properly called reasoning errors in the sense of violations of the laws of probability. The conceptual distinction between single-event and frequency representations is sufficiently powerful to make this allegedly stable cognitive illusion disappear. The conjunction fallacy is not the only cognitive illusion that is subject to this argument.

 

Base Rate Fallacy

Casscells et al. (1978) presented 60 staff and students at Harvard Medical School with the following problem:

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person's symptoms or signs?

If one inserts these numbers into Bayes' theorem, the posterior probability that the person actually has the disease is .02, or 2% (assuming that the test correctly diagnoses every person who has the disease - a piece of information that is missing).

However, almost half of the 60 staff and students at Harvard Medical School estimated this probability as .95, or 95%, not 2%. Only 11 participants answered 2%. Note the variability in the judgments of physicians about the probability of the disease! The model answer of .95 was taken as an instance of the base rate fallacy, or, base rate neglect (Tversky & Kahneman, 1982). This term signifies that the base rate of the disease (1/1000) is neglected, and the judgment is based only (or mainly) on the characteristics of the test (the false positive rate). This cognitive illusion has been widely discussed and given much prominence.

"The failure to appreciate the relevance of prior probability in the presence of specific evidence is perhaps one of the most significant departures of intuition from the normative theory of prediction" (Kahneman & Tversky, 1973, p. 243).

Little is known about how the participants made these judgments, and why these were so variable. It just seems that students and staff did not get effective training in statistical reasoning at Harvard Medical School.

 

How to Make the Base Rate Fallacy Disappear

I will now apply the same argument to the Harvard Medical School problem as I did to the Linda problem. Assume there is some kind of algorithm for statistical reasoning that works on frequency representations. Therefore if we change the information representation in the Harvard Medical School Problem from percentages and single-event probabilities to frequencies, then the base rate fallacy should also disappear. As a consequence, the large variability in judgments should disappear. This is a testable prediction.

When I made this prediction during luncheon discussions at the Center for Advanced Study in the Behavioral Sciences, two of the other fellows, Leda Cosmides and John Tooby, got up from the table and went down the hill to Stanford University, where they tested the prediction with 425 undergraduate subjects (Cosmides & Tooby, in press). They constructed a dozen or so versions of the Medical Problem as controls; of chief interest here is the frequency version:

One out of 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people. How many people who test positive for the disease will actually have the disease?

_____ out of _____.

In this version, the representation of the input information is changed from percentages, such as 5%, to frequencies such as "50 out of 1000." The representation of the output information is changed from a single-event probability ("What is the probability that a person...") to a frequency judgment ("How many people..."). This made the proportion of Bayesian answers skyrocket from 12% (in a replication using the original representation) to 76% (and to 92%, if subjects were instructed to visualize frequencies in a graphical display).

If only the representation of the input information was changed into frequencies, but not that of the output information, or vice versa, the effect of the change in information representation was halved. All other changes, such as adding the missing information about the false negative rate and the explicit information about random sampling, had little effect on the judgments, as several control versions showed.

We have the same result as for the Linda problem. Judgments labeled "base rate fallacy" largely disappear in the Harvard Medical School problem when we change the information representation from single events to frequency. The effect is about as strong as in the Linda problem.

Results in the same direction have been obtained on other reasoning problems when information representation was only partially changed into a frequency format, such as using sequential monitoring of frequency information and random sampling from a collective (e.g. Borgida & Brekke, 1981; Christensen-Szalanski & Beach, 1982; Gigerenzer, Hell & Blank, 1988; McCauley & Stitt, 1978). [3]

It is also instructive that some researchers tend to change their own information representation when they turn away from the subject and explain the correct solution to the reader. An early example is Hammerton (1973) who used single-event probabilities to communicate information to his subjects:

"1. A device has been invented for screening a population for a disease known as psylicrapitis. 2. The device is a very good one, but not perfect. 3. If someone is a sufferer, there is a 90% chance that he will be recorded positively. 4. If he is not a sufferer, there is still a 1% chance that he will be recorded positively. 5. Roughly 1% of the population has the disease. 6. Mr. Smith has been tested, and the result is positive. The chance that he is in fact a sufferer is: _____" (p. 252).

When the author explains the correct answer to his readers, he switches, without comment, into a frequency representation:

"Out of every 100 persons tested, we expect 1 to have the disease; and the device is nearly certain to say that he has. Also, out of that 100, we expect the machine to say that 1 healthy person has the disease. Thus, in the long run, out of every 100 persons tested, we expect 2 positive results, one of which will be correct and the other incorrect. Therefore the odds on any positive result being valid are roughly even" (p. 252).

The frequency format can be easily digested by Hammerton's readers. However, Hammerton's subjects not surprisingly failed on the single-event representation. Their median response was not one-to-one (i.e. 50%), but 85%.

Thus far, we have seen how to make two cognitive illusions, the conjunction fallacy in the Linda Problem and the base rate fallacy in the Harvard Medical School Problem, largely disappear. I will now turn to a third prominent illusion.

 

Overconfidence Bias

Confidence in one's knowledge is typically studied with questions of the following kind:

Which city has more inhabitants?

(a) Hyderabad

(b) Islamabad

How confident are you that your answer is correct?

50%, 60%, 70%, 80%, 90%, 100%

Imagine you are an experimental subject: Your task is to choose one of the two alternatives. Possibly you choose Islamabad, as most subjects in previous studies did. (If your choice was indeed Islamabad, you agree with the majority of subjects but are, regrettably, wrong.) Then you are asked to rate your confidence that your answer "Islamabad" is correct. 50% confident means guessing, 100% confident means that you are absolutely sure that Islamabad is the larger city. After many subjects answer many questions, the experimenter counts how many answers in each of the confidence categories were actually correct.

The major finding of some two decades of research is the following: In all the cases where subjects said, "I am 100% confident that my answer is correct," the relative frequency of correct answers was only about 80%; in all the cases where subjects said, "I am 90% confident" the relative frequency of correct answers was only about 75%, when subjects said "I am 80% confident" the relative frequency of correct answers was only about 65%, and so on (Lichtenstein, Fischhoff & Phillips, 1982). Values for confidence were systematically higher than relative frequencies. This systematic discrepancy has been interpreted as an error in reasoning and has been named "overconfidence bias." Quantitatively, overconfidence bias is defined as the difference between mean confidence and mean percentage correct.

Consistent with the general research strategy of the heuristics-and-bias program, the explanandum is a discrepancy (overconfidence bias) between a confidence judgment and a norm (frequency), not the confidence judgments by themselves (there are some exceptions, e.g. May, 1987). Little, however, has been achieved in explaining this discrepancy. A common proposal is to explain "biases" by other, deeper mental flaws. For instance, Koriat, Lichtenstein and Fischhoff (1980) propose that the overconfidence bias is caused by a "confirmation bias." Here is their explanation. After one alternative is chosen (e.g. Islamabad), the mind searches for further information that confirms the answer given, but not for information that could falsify it. This selective information search artificially increases confidence. The key idea is that the mind is not a Popperian. Other deficiencies in cognition and motivation have been suggested as explanations: Fischhoff, Edwards, and others proposed that subjects are insensitive to item difficulty (von Winterfeldt & Edwards, 1986, p. 128). Dawes (1980) suggested the tendency of humans in the Western world to overestimate their intellectual powers, which "has been reinforced by our realization that we have developed a technology capable of destroying ourselves" (p. 328). Others have proposed motivational reasons such as "fear of invalidity" or "illusion of validity." Note that in all these explanatory attempts the experimental phenomenon is seen as a "cognitive illusion," that is, an error in probabilistic reasoning, that has to be explained by some more deeper flaw in our mental or motivational programming.

Similar to the conjunction fallacy, overconfidence bias has been suggested as an explanation for human disasters of many kinds, including deadly accidents in industry (Spettell & Liebert, 1986), errors in the legal process (Saks & Kidd, 1980) and systematic deviations from rationality in negotiation and management (Bazerman & Neale, 1986).

 

Checking the Normative Yardstick

Is overconfidence bias really a "bias" in the sense of a violation of probability theory? Let me rephrase the question: has probability theory been violated if one's average degree of belief (confidence) in a single event (i.e., that a particular answer is correct) is different from the relative frequency of correct answers in the long run? From the point of view of the frequency interpretation, the answer is "no," for the reasons already discussed. Probability theory is about frequencies; it does not apply to single-event judgments like confidences. Therefore, no statement about confidences can violate the laws of probability. Even for Bayesians, however, the answer is not "yes," as it was with the conjunction fallacy. The issue here is not internal consistency or coherence, but the relation between subjective probability and external (objective) frequencies, which is a more complicated issue and depends on conditions such as exchangeability (for a discussion related to overconfidence see Kadane & Lichtenstein, 1982).

To summarize: A discrepancy between confidence in single events and relative frequencies in the long run is not an "error" in the sense of a violation of probability theory, contrary to the claims in the heuristics-and-biases literature. It only looks that way from the perspective of a narrow interpretation of probability theory that blurs the fundamental distinction between single events and frequencies.

 

How to Make Overconfidence Bias Disappear

Many experiments have demonstrated the stability of the overconfidence phenomenon despite various "debiasing methods" (Fischhoff, 1982). In our own experiments, we have also confirmed the stability despite those methods (Gigerenzer, Hoffrage & Kleinbölting, 1991). We warned subjects, prior to the experiment, of overconfidence, or gave them monetary incentives - this did not decrease overconfidence. We tried it with a bottle of French champagne as an incentive - to no avail. To quote von Winterfeldt and Edwards (1986, p. 539): "... overconfidence is a reliable, reproducible finding." And they conclude, with a tone of regret: "... can anything be done? Not much." (Edwards & von Winterfeldt, 1986, p. 656). Let's see.

I will now apply to the overconfidence bias the same argument as before to the conjunction fallacy and base rate fallacy. Assume an experiment in which you present subjects with fifty general-knowledge questions of the Hyderabad-Islamabad type and ask them for confidence judgments, as usual. Here is where this experiment diverges from earlier work. You also ask the same subjects about judgments of the frequency of correct answers: "How many of these 50 questions do you think you have answered correctly?" Assume your subjects' mean confidence judgments are, just like in earlier studies, systematically higher than their relative frequency of correct answers. That is, you replicate the earlier findings and get a typical overconfidence bias - expressed as "(mean confidence minus relative frequency correct) x 100" - of about 15. What do you guess how your subjects' frequency judgments will compare with the true frequency of correct answers?

If confidence in one's knowledge were truly biased due to confirmation bias, wishful thinking, or other deficits in cognition, motivation, or personality, then the difference between a single-event and a frequency representation should not matter. Overestimation should remain stable, just as it does with warnings and French champagne.

Ulrich Hoffrage, Heinz Kleinbölting and I have performed this and related experiments (for details see Gigerenzer et al., 1991). Table 1 shows the results of two experiments with 80 and 97 subjects, respectively. Only averages are shown here, because individual results conformed well to averages. In both experiments, the stable discrepancy between mean confidence and the true relative frequency of correct answers could be replicated. This is necessary for control, but no surprise. Overconfidence bias (multiplied by 100) was 13.8 and 15.4, respectively. What about the frequency judgments? When we compared subjects' estimated frequencies with their true frequencies overestimation disappeared. In both experiments subjects showed a tendency towards underestimation. In Table 1, the differences between estimated and true frequencies are expressed as "(estimated relative frequencies minus true relative frequency) x 100," for comparison. For instance, in Experiment 1, the average estimated frequency of correct answers (in a series of 50 questions) was 1.2 lower than the true frequency of correct answers, which corresponds to -2.4 in 100 questions. Negative signs denote underestimation, positive signs overestimation.

To summarize: I have argued that the discrepancy between mean confidence and relative frequency of correct answers, known as "overconfidence bias," is not an error in probabilistic reasoning. It only looks that way from a narrow normative perspective, in which the distinction between single-event confidence and frequencies is blurred. If we ask our subjects about frequencies instead of single-event confidences, we can make this stable phenomenon disappear.

It is easy to see how my argument, illustrated here by three prominent examples, can be extended to and tested for other cognitive illusions. The philosophical distinction between single-event probabilities and frequencies teaches us that the irrationality claim, at least as based on these examples, is premature. The normative yardstick does not stand up to closer examination. The distinctions between algorithm and information representation, and between single event and frequencies, combined with the notion of the mind as a frequency monitoring device, teaches us how to make apparently stable cognitive illusions disappear. This is of course good news for those who would like to believe in human rationality, or for those biologically-minded people who wonder how a species so bad at statistical reasoning could have survived so long, and also for those unfortunate souls charged with teaching undergraduate statistics.

Earlier explanations of reasoning in terms of a general representativeness heuristic or a general confirmation bias cannot account for these striking results. We have to look for a fresh understanding of cognitive processes that explains both the old and new facts. What follows is a brief introduction into the theory of probabilistic mental models (Gigerenzer et al. 1991). The theory explains both the old facts (the robust overconfidence and hard-easy effects of the last two decades), the new facts (the disappearance of overconfidence) and makes several other novel predictions. [4]

 

IV. Probabilistic Mental Models

I will illustrate the theory of probabilistic mental models (for short, PMM theory) by the following problem:

Which city has more inhabitants?

(a) Heidelberg

(b) Bonn

How confident are you that your answer is correct?

50%, 60%, 70%, 80%, 90%, 100%

Assume that subjects do not know the answer, but have to make an inference under uncertainty. How is that inference made?

Before I start outlining the theory, a general remark on explanatory strategy is helpful. Our explanandum is confidence and choice, and not overconfidence bias. That is, we attempt to explain judgment, not the deviation of judgment from some controversial norm. As a consequence, we do not need to invoke deeper-level biases (such as confirmation biases) or error-prone heuristics as explanations. This contrasts with the heuristics-and-biases program. Nor do we invoke explanations that assume perfect knowledge and unlimited computational and attentional capacities, as in traditional rational choice theories. Instead, PMM theory postulates cognitive mechanisms that work well given limited knowledge, limited attention and limited computational capacities. In these respects, PMM theory is a model of "bounded rationality" (Simon, 1955).

PMM theory assumes that a frame of inference is constructed to solve a particular problem such as the Heidelberg-Bonn problem. This frame of inference is called a PMM. A PMM generalizes the two alternatives, Heidelberg and Bonn, to a reference class, such as "all cities in Germany." And it generalizes the target variable, number of inhabitants, to a network of probability cues that co-vary with the target. Thus, a PMM consists of a reference class (that contains the two alternatives), a target variable and probability cues.

Table 2 shows examples of probability cues for population size in the reference class "German cities." Take the soccer-team cue. Large cities are likely to have a team playing in the Soccer Bundesliga, in which the 18 best teams compete. The ecological validity of this cue can be determined by checking all pairs in which one city has a team in the Bundesliga but the other does not. For instance, one finds that in 91% of these cases the city with the Bundesliga team has more inhabitants (calculated for 1988/89, for what were then West Germany cities with more than 100,000 inhabitants). Thus, the ecological validity of the soccer cue is .91 in this reference class. Note that it is defined as a relative frequency, not as a Pearson correlation as in Brunswik's (1955) framework. Ecological validity is defined on the environment, whereas cue validity is the corresponding concept in a subject's PMM. I will call a PMM well-adapted if the cue validities correspond well to the ecological validities.

Note, however, that the soccer team cue cannot be activated for the Heidelberg-Bonn problem: Neither city has a team in the Bundesliga; so the cue does not differentiate. In fact, only the last cue in this list can be activated, and this capital cue does not have a particularly high cue validity - because it is well known that Bonn is not exactly London or Paris. (The low cue validity may change soon, however, because Bonn's days as capital are numbered).

PMM theory assumes when activation rates are low or time pressure occurs, as is typical for studies of general knowledge, that the first cue that can be activated determines choice (here: Bonn) and that confidence equals cue validity (Table 3). This algorithm is "satisficing" (Simon, 1982) in the sense that it produces good performance, but not necessarily optimal performance. For instance, we will see later that this algorithm can produce surprisingly high levels of correct answers given quite limited knowledge, and it can produce good calibration if PMMs are well-adapted. The algorithm is a variant of bounded rationality (Simon, 1955) insofar as it is designed to work on limited knowledge and on the first cue activated. The latter avoids computationally complex integrations of many cues.

Now consider a frequency task. Subjects answer several hundred questions of the Heidelberg-Bonn type. After each group of 50 questions they are asked "How many of the last 50 questions do you think you have answered correctly?" The point is that according to PMM theory, confidence and frequency judgments are based on different cognitive processes, because different PMMs have to be constructed (Table 4).

The target variable in the confidence task is number of inhabitants, whereas in the frequency task it is number of correct answers. As a consequence, reference class and probability cues are different, too. A soccer cue, for example, no longer helps. A task that involves judgments of frequencies of correct answers has a different reference class: sets of general knowledge questions in similar testing situations. And base rates of earlier performance in such testing situations are an example of a probability cue for frequency judgments. Note that both single-event confidence and frequency judgments are explained by reference to experienced frequencies. However, these experienced frequencies relate to different reference classes, which are in turn part of different PMMs.

PMM theory can be quantitatively simulated; for the present purpose, however, qualitative predictions are sufficient. In the following sections, I will derive several novel predictions from PMM theory, some of them being counterintuitive and therefore surprising. First, however, we will see how PMM theory explains the stable overconfidence bias.

 

Explaining Old Facts: Overconfidence Bias

PMM theory explains the stable overconfidence effect in the following way. Assume that subjects' PMMs are, on the average, well adapted. This means that although subjects' knowledge about some domain (such as about German city populations) may be limited, it should not be systematically biased. This implies that cue validities roughly correspond to ecological validities, but it does not imply that subjects know all the relevant cues. If the general knowledge questions were a representative sample from the knowledge domain, zero overconfidence would be expected. For instance, if the soccer cue has an ecological validity of about .9, and the cue validity matches this, it follows from PMM theory that confidence would be around .9 in those cases where the soccer cue can be activated. From the definition of the ecological validity it follows that the relative frequency of correct answers would be .9, too. However, general knowledge questions typically are not representative samples from some domain of knowledge, but are selected to be difficult or even misleading. The Hyderabad-Islamabad question is an example for a misleading question. Here, a usually valid cue, the capital cue (Islamabad is a capital, Hyderabad is not), leads to a wrong choice: Hyderabad has a much larger population.

Selecting difficult and misleading questions decreases the number of correct answers, and "overconfidence bias" results as a consequence of selection, not of some deficient mental heuristic. To the best of my knowledge, all previous studies that have demonstrated overconfidence in general knowledge have used selected questions: this explains the stability of the phenomenon against warning, monetary incentives, and French champagne.

Here are several novel predictions.

 

Novel Predictions

Prediction 1. Confidence and representative sampling. Assume (1) well-adapted PMMs as above, and (2) use a representative sample of questions from some knowledge domain. PMM theory predicts that overconfidence will disappear under these conditions. We have tested this prediction using random samples from the reference class "all cities in Germany with more than 100,000 inhabitants" (Gigerenzer et al., 1991). In Experiment 1, "overconfidence bias" decreased from 13.8 in a set of selected questions to 0.9 in a representative sample; in Experiment 2 the decrease was from 15.4 to 2.8. Juslin (in press, a) has also confirmed this novel prediction using random samples from several other domains of knowledge.

Prediction 2. Frequency judgments and selected sampling. Recall that PMM theory implies that frequency judgments such as "How many of the last 50 questions do you think you got right?" are solved by a PMM with a different reference class (e.g. other general knowledge tests). Assume (1) that the PMMs for a frequency task are well adapted and (2) use a set of questions that are representative for this reference class. Because the typical sets of general-knowledge questions used in earlier research are representative for this reference class, frequency judgments should be accurate. We have tested and confirmed this novel prediction (see Table 1).

The crucial point is that confidence and frequency judgments refer to different kinds of reference classes. The same set of questions can be representative with respect to one reference class, and at the same time selected with respect to the other class. Thus a set of 50 general-knowledge questions of the city type may be representative for the reference class "general knowledge questions," but not for the reference class "cities in Germany" (because city pairs have been selected for being difficult or misleading). Asking for a confidence judgment summons up a PMM based on the reference class "cities in Germany;" asking for a frequency judgment summons up a PMM based on the reference class "sets of general knowledge questions."

Prediction 3. Underestimation in frequency judgments. We use here the situation of prediction 1 to deduce a condition in which frequency judgments underestimate the true frequency of correct answers. If a PMM for frequency judgment is well-adapted to its reference class (i.e., sets of selected items), but the actual set of questions is not selected, then we expect frequency judgments to be underestimations of true frequencies. We have tested and confirmed this novel prediction (Gigerenzer et al., 1991). In Experiment 1, the difference between estimated and true frequency of correct answers decreased from -2.4 (set of selected items, see Table 1) to -11.8; and from -4.2 (see Table 1) to -9.3 in Experiment 2.

Further novel predictions can be derived from quantitative simulations of PMM theory. Here is one last example. The prediction is about percentage correct, that is, about correct choice rather than about confidence or frequency judgments, on which we have focused so far.

Prediction 4. When little knowledge is as good as good knowledge. Recall that in the experiments just reported, our subjects were German, and they were answering questions about German cities. Their mean percentage of correct answers varied between 70% and 75% (for representative samples of cities). Assume we take a new sample of German students who are just as good as the earlier ones - they are familiar with German cities and know the relevant probability cues. We do the same kind of experiment; the only difference is that we give them questions about an environment which is highly unfamiliar to them: cities in the US. More precisely, we take the 75 largest cities in the US, draw a random sample of 100 pairs, and give these 100 questions to our German subjects. What would you predict?

All theories of overconfidence I am aware of are mute on the issue of percentage correct. All the people I have asked so far concluded that percentage correct will be much lower when subjects answer these 100 questions about foreign cities. From our simulations with PMM theory, we derived a quite different and surprising prediction: Subjects will do just as well with American as with German cities. That is, their percentage correct will be the same for German and US cities. I will deduce this prediction here by a simplified calculation.

We take the 75 largest cities in the US. Assume that our German subjects have not even heard of half of these, such as Mesa, Mobile, and Shreveport, and that they know nothing about the other half, except that they have heard of these cities. Thus, their PMM is poor; the only probability cue it can generate is the familiarity cue, that is whether one has heard of a city or not. This familiarity cue is of high cue validity, but it plays almost no role in judgments about German cities, because most of our subjects have heard of all these German cities. Thus, it can rarely be activated. The point is that for judgments about US cities, the familiarity cue has a high activation rate. To be precise, if half of the US cities are familiar, the activation rate is 50.7%. [5] What is the validity of the cue? Pretests have shown that the cue validity is around .90. [6] Thus, we have about 50% of questions where the familiarity cue can be activated, and 50% where it cannot (because either the names of both cities are known or both are unknown). For the first group, we expect 90% accuracy, given a cue validity of .90. That is, in absolute terms, 45% correct answers. In the other group, we expect by mere guessing an additional 25% correct answers, that is, altogether, 70% correct answers. This value is counterintuitively large. Note that this value is in the range of the percentage correct for German cities (70-75%), although it has been calculated on the assumption of no specific knowledge. Any such knowledge (i.e., that New York is larger than Boston) will add on to this estimate.

Thus, PMM theory makes a counterintuitive prediction: In the situation described, German subjects will get about the same percentage correct in judgments about unfamiliar US cities as in judgments about German cities.

Horst Kilcher, Ulrich Hoffrage and I conducted an experiment. 56 subjects each answered 200 questions of the Heidelberg-Bonn type, 100 being a random sample of city pairs from the 75 largest US cities, the other 100 being a random sample from the 75 largest German cities. Half of the subjects got the questions about the German cities first, the other half those about US cities. Consistent with our earlier experiments, mean percentage correct was 75.6% for German cities. But what was the percentage correct for judgments about US cities?

Table 5 shows that mean percentage correct for US cities was 76%, that is, about the same as for the German cities about which our subjects had considerably more knowledge. This result follows from PMM theory. Here we have an interesting situation, where quite limited knowledge (but not no knowledge) produces the same good performance (percent correct) as quite good knowledge.

To summarize my second part: I have briefly presented PMM theory, which specifies the cognitive processes underlying choice, confidence and frequency judgments. The theory implies conditions under which overconfidence appears: either a PMM for a task is not properly adapted to a corresponding environment (e.g., cue validities do not correspond to ecological validities), or the set of objects used is not a representative sample from the corresponding environment, but is selected for difficulty. In our experiments, overconfidence disappeared when random samples instead of selected samples were used, which is consistent with the latter explanation. Thus, the source of overconfidence seems to be in the relation between the sample of objects used in the task and the reference class in a corresponding environment. Overconfidence does not seem to be located in the relation between PMM and a corresponding environment (that is, in a low correspondence between cue validities and ecological validities).

PMM theory specifies conditions under which the "robust" overconfidence effect of the last 15 years appears, disappears and even inverts. One can no longer speak of a general overconfidence bias, in the sense that it relates to deficient processes of cognition or motivation. I have not dealt here with how the theory explains the second robust finding in the literature - the hard-easy effect (that is, overconfidence increases with item difficulty). I will simply mention that the theory also provides an explanation for the hard-easy effect on the same principles, and specifies conditions under which it disappears or even inverts. Juslin (in press, b) has tested and confirmed a prediction from PMM theory that specifies conditions that make the hard-easy effect disappear (Gigerenzer et al., 1991, p. 512). Simulations with PMM theory have allowed us to explain several anomalies in the literature, and integrate earlier explanatory attempts into a comprehensive theoretical framework. For instance, Koriat, et al.'s (1980) results testing the confirmation bias explanation can be fully integrated into PMM theory (Gigerenzer et al., 1991). PMM theory seems to be the first theory in this field that offers a coherent explanation not only of the effects previously reported in the literature on judgment under uncertainty, but also for the new results we have obtained in our experiments.

 

V. Conclusions

Since the Enlightenment, probability theory has been seen as the codification of human rationality. Consequently, recent experiments suggesting that human reasoning systematically violates the laws of probability have been widely cited as evidence for human irrationality. Here are the arguments of this chapter.

(1) I have argued that the cognitive illusions I have dealt with are not genuine illusions, contrary to the assertions in the heuristics-and-biases literature. They only look like errors from a narrow normative view about what is right and wrong in reasoning, a view that blurs the philosophical distinction between single-event probabilities and frequencies.

(2) I have linked this philosophical distinction with Marr's (1982) distinction between algorithms and information representation, and with the evolutionary idea that the mind's algorithms are tuned to frequency information. This framework suggests how to make apparently stable cognitive illusions disappear. I have demonstrated this using three cognitive illusions, widely cited as evidence for human irrationality. The new facts cannot be accounted by the old explanations invoking heuristics such as representativeness.

(3) I introduced the theory of probabilistic mental models (PMM theory) as an alternative explanation of intuitive reasoning, using confidence in one's knowledge as an example. The theory explains both old and new facts. PMM theory postulates a mental algorithm that process frequency information from the environment. This algorithm works well given only limited knowledge, limited attention, and limited computational capacities, and is a variant of bounded rationality. The theory describes reasoning and performance in terms of relations between a PMM, an environment, and an experimental task. Focusing on mental algorithms alone, whether they seem to be good or bad ones, turns out to be too narrow for understanding the mind, and also, for discussing rationality.

 

 

   
   

Table 1

Difference between

Experiment 1

Experiment 2

(n = 80)

(n = 97)

Mean confidence and true relative frequency of correct answers (overconfidence)

+13.8

+15.4

Estimated frequency and true frequency of correct answers

- 2.4

-4.2

Overestimation Disappears in Judgments of Frequency

Note: To make values for frequency and confidence judgments comparable, all frequencies were transformed to relative frequencies. Values shown are differences multiplied by a factor of 100.

 

 

Table 2

Probability cues for solving tasks of the Heidelberg-Bonn type. Examples given are for the reference class "cities in Germany"

Probability cues

(1) Soccer-team cue (one city's soccer team plays in the soccer "Bundesliga," the other city's team does not).

(2) State capital cue (one city is a state capital, the other city is not).

(3) Industrial cue (one city is located in the "Ruhrgebiet," the other in rural Bavaria).

(4) License plate cue (the letter code that identifies a city on a license plate is shorter for one city than for the other).

(5) Familiarity cue (one has heard of one city, but not of the other).

(6) Capital cue (one city is a capital, the other city is not).

 

 

Table 3

PMM algorithm for choice and confidence (see Gigerenzer et al., 1991)

Task: Choose the correct alternative, a or b, and give a confidence judgment.

Algorithm:

Step 1: Generalize a and b to a reference class R, where a,b R.

Step 2: Generate cue Ci highest in cue validity.

Step 3: Generate values of a and b for cue Ci. If one or both values are unknown, go back to Step 2 and generate the cue next highest in cue validity.

Step 4: Test whether values of a and b differ, i.e., whether Ci can be activated. If yes, denote this as aCib. If not, go back to Step 2.

Step 5: Choose a if p(a|aCib;R) > p(b|aCib;R). (For example, let aCib stand for "a has a soccer team in the Bundesliga but b does not." Then p(a|aCib;R) is the probability that a has the larger population given aCib, for all a,b R. This probability is the cue validity, and R is the reference class.)

Step 6: Confidence = p(a|aCib;R). (The confidence that the choice a is correct is equal to the cue validity of activated cue Ci.)

Note: Knowledge of cues can be limited, i.e., only a subset of all ecologically valid cues may be available from memory (Step 2). Knowledge of values can be limited, too. Cues have binary values (yes/no; see Table 2), but knowledge is tertiary (yes/no/unknown; see Step 3).

 

 

Table 4

Probabilistic mental models for single-event (confidence) and frequency tasks.

PMM
Confidence task
Frequency task
Target variable Number of inhabitants Number of correct answers
Reference class Cities in Germany Similar sets of general knowledge questions in similar testing situations
Probability cues e.g., soccer team cue, state capital cue e.g., base rates of previous performance

 

 

Table 5

Mean percentage of correct answers.

 

US

German

Mean percentage correct

76.0

75.6

 

(SEm=0.7)

(SEm=0.9 )

Mean confidence

72.3

79.5

 

(SEm=l.0)

(SEm=0.8)

 

 

   

 

 

   

Footnotes

[1] The debate between the frequentists and Bayesians was particularly lively before the 1970s. Today, both sides know each other's arguments well and the vital debate has turned into sterile, well-rehearsed argument. The two sides seem to have quit listening. As Glenn Shafer (1989) complained, statistics departments no longer provide a forum for the debate, and the main divisions over the meaning of probability now follow disciplinary lines: frequentists dominate statistics and experimental social sciences, Bayesians predominate in artificial intelligence and theoretical economics. "Conceptually and institutionally, probability has been balkanized" (p. 15).
[2] The attentive reader will have noticed that the frequency version of the Linda Problem asks for a quantitative judgment, whereas the single-case version asks for a comparative judgment. The latter, however, is an accidental feature of our choice of example. Single-case versions asking for quantitative judgments ("What is the probability that Linda is a bank teller?") are known to give about the same amount of conjunction errors as comparative judgments (Tversky & Kahneman, 1983).
[3] In some studies widely cited as demonstrating base rate neglect, subjects were not informed about random sampling. In the "Tom W." problem (Kahneman & Tversky, 1973), the crucial information about how the personality sketch of Tom W. was selected, whether randomly or not, is missing. The same holds for the Gary W. and Barbara T. problems that Ajzen (1977) used. Several studies have demonstrated that it can make a difference to subjects' reasoning when they learn about random sampling (e.g. Ginossar & Trope, 1987; Grether, 1980; Hansen & Donoghue, 1977; Wells & Harvey, 1977), or, even better, when they can actually watch random sampling. For instance, the neglect of base rates in the Engineer-Lawyer problem (Kahneman & Tversky, 1973) largely disappears when subjects themselves do the random sampling (Gigerenzer, Hell & Blank, 1988). For general critical discussions of the evidence see Berkeley and Humphreys (1982), Gigerenzer and Murray (1987, chapter 5), Lopes (1991), Lopes and Oden (1991), Macdonald (1986), and Scholz (1987).
[4] Other proposals have been made in the literature to explain the old facts, that is, the cognitive illusions. I cannot discuss these here, but only mention a few: The role of conversational principles in the experimenter-subject interaction (Adler, 1991); the evolutionary idea that there are domain-specific reasoning mechanisms (e.g., for cheating detection) that reflect our inherited social intelligence rather than a domain-general logic (e.g., Cosmides, 1989; Gigerenzer & Hug, 1992), and the idea that category judgments such as in probability revision problems and in the Linda Problem can be modeled by connectionist architectures (e.g., Gluck & Bower, 1988).
[5] There are 75 cities, 38 are familiar, 37 not (or 37 familiar, 38 not, which leads to the same result). If two familiar cities are compared, or two unfamiliar ones, the familiarity cue cannot be activated; it can only be activated if one city is familiar but the other is not. The number of such familiar-unfamiliar pairs is 38x37, and the number of all possible pairs is 75x74/2. Thus, the activation rate is 38x37 divided by 75x74/2, which is 38/75 or 50.7%. The activation rate can be determined in this way for each individual separately depending on the number of familiar and unfamiliar cities. For instance, if not 1/2, but only 1/3 of the cities were familiar, the activation rate would change slightly, from 50.7% to 45%.
[6] The cue validity of the familiarity cue can be calculated for each individual from the set of familiar-unfamiliar pairs. The relative frequency in which the familiar city has actually the larger population is the cue validity.

 

References

Adler, J.E. (1991). An optimist's pessimism: Conversation and conjunction. Posnan Studies in the Philosophy of the Sciences and Humanities, 21, 25l-282

Ajzen, I. (1977). Intuitive theories of events and the effects of base-rate information on prediction. Journal of Personality and Social Psychology ,35, 303-314.

Aristotle (1945). Movements of animals. (Translated by Foster). Cambridge, MA: Harvard University Press.

Barnard, G.A. (1979). Discussion of the paper by Professors Lindley and Tversky and Dr. Brown. Journal of the Royal Statistical Society of London, (A), 142, 171-172.

Bazerman, M.H., & Neale, M.A. (1986). Heuristics in negotiation: Limitations to effective dispute resolution. In H.R. Arkes & R.R. Hammond (Eds.), Judgment and decision making: An interdisciplinary reader (pp. 311-321). Cambridge: Cambridge University Press.

Berkely, D., & Humphreys, P. (1982). Structuring decision problems and the "bias heuristic." Acta Psychologica, 50, 201-252.

Boole, G. (1958). An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. New York: Dover (original work published in 1854).

Borgida, E., & Brekke, N. (1981). The base rate fallacy in attribution and prediction. In J.H. Harvey, W. Ickes & R.F. Kidd (Eds.), New Directions in Attribution Research, Vol. 3, Hillsdale, NJ: Erlbaum, pp. 63-95.

Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193-217.

Casscells, W., Schoenberger, A., & Grayboys, T. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299, 999-1000.

Christensen-Szalanski, J.J.J. & Beach, L.R. (1982). Experience and the base-rate fallacy. Organizational Behavior and Human Performance, 29, 270-278.

Cohen, L.J. (1982). Are people programmed to commit fallacies? Further thoughts about the interpretation of experimental data on probability judgment. Journal for the Theory of Social Behavior, 12, 251-274.

Cohen, L.J. (1983). The controversy about irrationality. Behavioral and Brain Sciences, 6, 510-517.

Cohen, L.J. (1986). The dialogue of reason. Oxford: Clarendon Press.

Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task. Cognition, 31, 187-276.

Cosmides, L., & Tooby, J. (in press). Are humans good intuitive statisticians after all? Rethinking some conclusion from the literature on judgment under uncertainty. Cognition.

Daston, L. (1988). Classical probability in the Enlightenment. Princeton, NJ: Princeton University Press.

Dawes, R.M. (1980). Confidence in intellectual judgments vs. confidence in perceptual judgments. In E.D. Lantermann & H. Feger (Eds.), Similarity and choice: Papers in honor of Clyde Coombs (pp. 327-345). Bern, Switzerland: Huber.

Edwards, W., & von Winterfeldt, D. (1986). On cognitive illusions and their implications. In H.R. Arkes & R.R. Hammond (Eds.), Judgment and decision making: An interdisciplinary reader. Cambridge: Cambridge University Press, pp. 642-679.

Evans, J. StB. T. (1984). Heuristic and analytic processes reasoning. British Journal of Psychology, 75, 451-468.

Fiedler, R. (1988). The dependence of the conjunction fallacy on subtle linguistic factors. Psychological Research, 50, 123-129.

Fischhoff, B. (1982). Debiasing. In D. Kahneman, P. Slovic & A Tversky (Eds.), Judgment under uncertainty: Heuristics and Biases. Cambridge: Cambridge University Press, pp. 422-444.

Fong, G.T., & Nisbett, R.E. (1991). Immediate and delayed transfer of training effects in statistical reasoning. Journal of Experimental Psychology: General, 120, 34-45.

Gallistel, C.R. (1990). The organization of learning. Cambridge, MA: MIT Press.

Gigerenzer, G. (1991). On cognitive illusions and rationality. Posnan Studies in the Philosophy of the Sciences and Humanities, 21, 225-249.

Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and content: The use of base rates as a continuous variable. Journal of Experimental Psychology: Human Perception and Performance, 14, 513-525.

Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506-528.

Gigerenzer, G., & Hug, R. (1992). Domain-specific reasoning: Social contracts, cheating, and perspective change. Cognition, 43, 127-171.

Gigerenzer, G., & Murray, D.J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L. Beatty, J. & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge: Cambridge University Press.

Ginossar, Z., & Trope, Y. (1987). Problem solving in judgment under uncertainty. Journal of Personality and Social Psychology, 52, 464-474.

Gluck, M.A., & Bower, G.H. (1988). Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195.

Gould, S.J. (1992). Bully for brontosaurus. Further reflections in natural history. Penguin Books.

Grether, D.M. (1980). Bayes' rule as a descriptive model: The representativeness heuristic. The Quarterly Journal of Economics, 95, 537-557.

Hammerton, M. (1973). A case of radical probability estimation. Journal of Experimental Psychology, 101, 252-254.

Hansen, R.D., & Donoghue, J.M. (1977). The power of consensus: Information derived from one's own and others' behavior. Journal of Personality and Social Psychology, 35, 294-302.

Hasher, L., & Zacks, R.T. (1979). Automatic and effortful processes in memory. Journal of Experimental Psychology: General, 108, 356-388.

Hertwig, R. & Gigerenzer, G. (1993). Frequency and single-event judgments. Unpublished manuscript.

Hume, D. (1975). A treatise of human nature, L.A. Selby-Bigges, ed. Oxford: Clarendon Press (original work published 1739).

Inhelder, B., & Piaget, J. (1958). Growth of logical thinking: From childhood to adolescence. New York: Basic Books.

Juslin, P. (in press, a). The overconfidence phenomenon as a consequence of informal experimenter-guided selection of almanac items. Organizational Behavior and Human Decision Processes.

Juslin, P. (in press, b). An explanation of the hard-easy effect in studies of realism of confidence in one's general knowledge. European Journal of Cognitive Psychology.

Kadane, J.B., & Lichtenstein, S. (1982). A subjectivist view of calibration. Report No. 82-86. Eugene, OR: Decision Research.

Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430-454. Reprinted in D. Kahneman et al. (1982) (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 32-47). Cambridge: Cambridge University Press.

Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237-251.

Kanwisher, N. (1989). Cognitive heuristics and American security policy. Journal of Conflict Resolution, 33, 652-675.

Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6, 107-118.

Laplace, P.S. (1951). A Philosophical Essay on Probabilities. New York: Dover (original work published 1814).

Lichtenstein, S., Fischhoff, B., & Phillips, L.D. (1982). Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press, pp. 306-334.

Lopes, L. (1991). The rhetoric of irrationality. Theory & Psychology, 1, 65-82.

Lopes, L., & Oden, G.C. (1991). The rationality of intelligence. Poznan Studies in the Philosophy of the Sciences and the Humanities, 21, 199-223.

Macdonald, R.R. (1986). Credible conceptions and implausible probabilities. British Journal of Mathematical and Statistical Psychology, 39, 15-27.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman.

May, R.S. (1987). Realismus von Subjektiven Wahrscheinlichkeiten. Frankfurt/Main: Lang.

McCauley, C., & Stitt, C.L. (1978). An individual and quantitative measure of stereotypes. Journal of Personality and Social Psychology, 36, 929-940.

Mises, R. von (1957). Probability, statistics, and truth, London: Allen and Unwin (original work published in 1928).

Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97-131.

Nisbett, R.E., & Borgida, E. (1975). Attribution and the psychology of prediction. Journal of Personality and Social Psychology, 32, 932-943.

Piattelli-Palmarini, M. (1989). Evolution, selection and cognition: From "learning" to parameter setting in biology and in the study of language. Cognition, 31, 1-44.

Real, L., & Caraco, T. (1986). Risk and foraging in stochastic environment: Theory and evidence. Annual Review of Ecology and Systematics, 17, 371-390.

Saks, M., & Kidd, R.F. (1980). Human information processing and adjudication: Trial by heuristics. Law and Society Review, 15, 123-160.

Scholz, R.W. (1987). Cognitive strategies in stochastic thinking. Dordrecht, Holland: Reidel.

Shafer, G. (1989). The unity and diversity of probability. Inaugural lecture, November 20, 1989, University of Kansas.

Simon, H.A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69, 99-118.

Simon, H.A. (1982). Models of bounded rationality. 2 vols. Cambridge, MA: MIT Press.

Slovic, P., Fischhoff, B., & Lichtenstein, S. (1976). Cognitive processes and societal risk taking. In J.S. Carroll & J.W. Payne (Eds.), Cognition and social behavior. Hillsdale, NJ: Erlbaum: 165-184.

Spettell, C.M., & Liebert, R.M. (1986). Training for safety in automated person-machine systems. American Psychologist, 41,

Staddon, J.E.R. (1988). Learning as inference. In R.C. Bolles & M.D. Beecher (Eds.), Evolution and learning. Hillsdale, NJ: Erlbaum.

Stich, S.P. (1990). The fragmentation of reason. Cambridge, MA: MIT Press.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131.

Tversky, A., & Kahneman, D. (1982). Evidential impact of base rates. In D. Kahneman, P. Slovic, & A. Tversky (eds.), Judgement under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment Psychological Review, 90, 293-315.

Wells, G.L., & Harvey, J.H. (1977). Do people use consensus information in making causal attributions? Journal of Personality and Social Psychology, 35, 279-293.

Winterfeldt, D. von, & Edwards, W. (1986). Decision analysis and behavioral research, Cambridge: Cambridge University Press.

   
         
  Contact Author    
 

This is an electronic archival version of a chapter published in a printed book.
Please cite according to the published version
.

   
     

 

     
       
    » Home   » The Institute   » Electronic Full Texts   
  Update 6/2001   » webmaster-library(at)mpib-berlin.mpg.de
» ©Copyright