Psychological Challenges For Normative Models
View Dublin Core Metadata for this article external link

Gigerenzer, Gerd

 

Please note:
This paper is a preprint of an article published in D. Gabbay & P. Smets (Eds.), Handbook of defeasable reasoning and uncertainty management. Vol. 1: Quantified representation of uncertainty and imprecision. Dordrecht: Kluwer, 1998, therefore there may be minor differences between the two versions.
The copyright of this electronic version remains with the author and the Max Planck Institute for Human Development.

   

 

Psychological Challenges For Normative Models

Some years ago, at the Center for Advanced Study at Stanford, one of my economist colleagues concluded a discussion on cognitive illusions with the following dictum: "Look, either reasoning is rational or it's psychological." In this chapter, I argue against the widespread view that the rational and the psychological are opposed. According to this view, the rational is defined by the laws of probability and logic -- that is, by content-free axioms or rules, such as consistency, transitivity, Bayes's theorem, dominance, and invariance. The irrational is left to be explained by the laws of psychology. Here I present examples that are intended to illustrate that defining human rationality independent of psychology is myopic. The "challenges" in the title of this chapter are not directed against probability theory and logic, or specific versions thereof, but against using these systems as psychologically uninformed, content-free norms. Before I turn to these challenges, I begin with a historical example that illustrates how norms have been revised and made more realistic by the introduction of psychological concepts.

Back to the Blackboard

In the 17th century, a new conception of rationality emerged. This was a modest kind of reasonableness that could handle everyday dilemmas on the basis of uncertain knowledge, in contrast to the traditional rationality of demonstrative certainty (Daston, 1981, 1988). Those dilemmas were numerous: Believe in God? Invest in an annuity? Accept a gamble? The mathematical theory of probability was to codify this new brand of rationality, and its primitive concept was rational expectation, with expectation defined as expected value. Soon, however, it became apparent that minds do not always follow the dictates of the expected value. The St. Petersburg Paradox, explicated below, marked the celebrated clash between the new theory of expected value and human intuition.

Pierre offers to sell Paul an opportunity to play the following coin-tossing game. If the coin comes up heads on the first toss, Pierre agrees to pay Paul $1; if heads does not turn up until the second toss, Paul receives $2; if not until the third toss, $4; and so on. According to the standard method of calculating expected value, Paul's expectation E -- and therefore the fair price of the game -- is

 

   
   

ggpcfhodr01.gif (683 Byte)

   
   

 

Paul's monetary payoffs increase with decreasing probabilities of occurrence: Each of the terms is equal to 50 cents, and the expected value E is infinite. This calculation is straightforward, and there is nothing in the definition of expectation that excludes an infinite value. Nicholas Bernoulli, who first proposed this game in 1713, however, observed that no reasonable person would pay a large amount of money to play the game.

What should be done when the dictates of a norm (E) deviate from human intuitions about the reasonableness of a behavior? One can either stick to the norm and declare the behavior irrational, or incorporate the psychological into the norm. In 1738, Nicholas's cousin Daniel Bernoulli published a resolution of the paradox in the annals of the Academy of St. Petersburg (hence the name). Daniel Bernoulli psychologized the norm. He proposed that in situations such as the St. Petersburg gamble, the prospect of winning a certain amount of money, say $16, means something different for the rich and the poor man. Therefore a theory of reasonableness needs to incorporate personal characteristics such as a person's current wealth, whereas the concept of expected value was based on the impersonal notion of fairness. Bernoulli proposed replacing expected value, which excluded personal circumstances that might prejudice equal rights in legal contexts, with the "moral" expectation of prudence, defined as the product of the probability of an outcome and what later became known as its utility. The utility of money, Bernoulli argued, decreases the more you have.

In modern terminology, let U be the utility of an outcome, w a person's current wealth, and g the sure gain that would yield the same expectation as the St. Petersburg gamble. Then

 

   
    ggpcfhodr02.gif (872 Byte)    
   

 

Suppose that U(x) = ln(x), that is, the utility of money for Paul diminishes logarithmically with the amount of money he has. Then, if Paul's current wealth is $50,000, g is about $9. On this psychological assumption, Paul should be willing to pay no more than $9 for the St. Petersburg gamble.[1]

Daniel Bernoulli's revision of expected value theory into what is today known as expected utility theory exemplifies the Enlightenment attitude toward the relation between the rational and the psychological. By putting psychology into the equations, Bernoulli reunified the rational with the psychological. Expected value theory was a model, not a rigid norm, of rationality. When educated minds reasoned differently from what the theory predicted, this was seen as a problem for the theory, not for the mind, and mathematicians went back to the blackboard to change the equations. Today, as we will see, few researchers respond to such discrepancies by going back to the blackboard and revising their equations. The blame is placed on the mind, not on the model.

Separating the mathematical theory of probability from its applications would have seemed foreign to Bernoulli and the Enlightenment probabilists: Their theory was at once a description and a prescription of reasonableness. Along with hydrodynamics and celestial mechanics, the calculus of probability was part of what was then called "mixed mathematics," a term stemming from Aristotle's explanation of how optics and harmonics mixed the forms of mathematics with the matter of light and sound (Daston, 1992). Classical probability theory had no existence independent of its subject matter -- the beliefs of reasonable men. This is why classical probabilists perceived problems of the St. Petersburg kind as paradoxes -- not because there was a mathematical contradiction, but because the mathematical result contradicted good sense.

I invite you in the following pages to look with a Bernoullian eye at some present-day uses of normative models. I proceed by means of examples, each one chosen to illustrate how psychology can be brought to rationality. Some believe that is impure; but don't be misled.

1. Challenge One: Algorithms Work on Information That Needs Representation

Probability theory is mute about the representation of the information on which its rules should work. But systems that calculate, machines and minds alike, are sensitive to the representation of numerical information (Marr, 1982). Computational algorithms work on information, and information needs representation. For instance, my pocket calculator has an algorithm for multiplication. This algorithm is designed for Arabic numbers as input data and would perform badly if I entered binary numbers. Similarly, mental algorithms are designed for particular representations. Consider, for example, how difficult it would be to perform long division with Roman numerals. Arabic, Roman, and binary representations can be mapped onto each other one-to-one and are in this sense mathematically equivalent, but that does not mean they are psychologically equivalent. Physicist Richard Feynman (1967) made this point more generally, explaining that new discoveries can come from different formulations of the same physical law, even if they are mathematically equivalent: "Psychologically they are different because they are completely unequivalent when you are trying to guess new laws" (p. 53).

Let us consider the issue of information representation in research on Bayesian inference.

1.1 The Norm

The question whether humans reason the Bayesian way has been studied in problems with two hypotheses, H and ¬H (e.g., breast cancer and no breast cancer), and one datum D (e.g., a positive mammography). Here is one example:

The probability of breast cancer is 1% for a woman at age forty who participates in routine screening. If a woman has breast cancer, the probability is 80% that she will have a positive mammography. If a woman does not have breast cancer, the probability is 9.6% that she will also have a positive mammography.

A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? _____%

If one inserts these numbers into Bayes's theorem, the posterior probability p(H|D) is:

 

   
    ggpcfhodr03.gif (1207 Byte)    
   

 

The result is .078, or 7.8%. In sharp contrast, Eddy (1982) reported that 95 out of 100 physicians estimated the posterior probability p(cancer|positive) to be between 70% and 80%. Psychology undergraduates tend to give the same estimates. Staff at the Harvard Medical School showed not much more insight into a similar problem (Casscells, Schoenberger, & Grayboys, 1978). In short, very few people have an intuitive understanding of what to do with these probabilities.

Because many people estimated the posterior probability as being close to the hit rate (80%), it has been concluded that mental algorithms generally neglect base-rate information (see Gigerenzer & Murray, 1987, chap. 5; Koehler, 1996). Results from these and other studies have been taken as evidence that the human mind does not reason with Bayesian algorithms.

Yet this conclusion is not warranted, as the pocket calculator example illustrates. If I feed my pocket calculator binary numbers, and garbage comes out, it does not follow that the calculator has no algorithm for multiplication. Similarly, it would be impossible to detect a Bayesian algorithm in a system by feeding it information in a representation to which it is not tuned. A normative model must therefore specify both the information representation and the algorithm that works on this representation. What are the external representations of information for which cognitive algorithms are designed?

1.2. Psychologizing the Norm: Ecological Bayesianism

The problem we need to solve has one more unknown than the pocket calculator example. In the latter, we know the input representation and so can make informed guesses about the nature of the algorithm. In the case of human minds, we must also speculate about the external representation of statistical information for which cognitive algorithms are designed. We know some candidate representations, and some facts about them. In the mammography problem, information is represented in single-event probabilities (percentages). We know that probabilities and percentages are very recently invented means of representing information. The notion of "probability" did not gain prominence in probability theory until the 18th century, a century after the calculus of chance was invented (Gigerenzer et al., 1989). Percentages became common ways to represent numerical information during the 19th century (mainly for interest and taxes), after the metric system was introduced during the French Revolution, and became common tools for representing uncertainty only in this century. Therefore, it is unlikely that cognitive algorithms were designed for probabilities and percentages, if we think in evolutionary terms. In what representation have humans (and animals) acquired numerical information during most of their history? I assume here that they acquired it in terms of frequencies as actually experienced in a series of events, rather than probabilities or percentages (Cosmides & Tooby, 1996; Gigerenzer & Hoffrage, 1995). By "frequencies" I mean absolute frequencies, that is, natural numbers, as defined by natural sampling (see the right side of Figure 1; Gigerenzer & Hoffrage, 1995; Kleiter, 1994).

For a simple demonstration of the role of representation in reasoning, let us represent the information about the base rate (1%), hit rate (80%), and false alarm rate (9.6%) of the mammography problem in natural numbers rather than percentages. Imagine 100 women. One has cancer (the base rate) and will possibly test positive (the hit rate). Of the 99 women without cancer, about 10 will also test positive (the false alarm rate). So altogether 11 women will test positive. Question: How many of those who will test positive actually have breast cancer? Now most people easily "see" the answer: one out of 11.

 

   
    ggspyhand08.gif (10940 Byte)    
   

 

Why is this? Consider Figure 1. On the left side are probabilities (as in a typical medical text), and the Bayesian algorithm a physician would have to use to compute the posterior probability from a probability representation. On the right side, the same information is represented in terms of absolute frequencies. The interesting difference is that the Bayesian algorithm is computationally simpler when information is expressed in frequencies than in probability or percentages. Furthermore, only two kinds of frequencies need be attended to--"symptom & disease," and "symptom & no disease." Base rates (e.g., 10 out of 1,000 in Figure 1) need not be attended to; they are implicit in these two frequencies.

The simple demonstration above used approximate figures; a frequency representation of the mammography problem that is numerically equivalent to the probability representation can be constructed by using a class of 1,000 instead of 100 women, as in the "natural sampling tree" in Figure 1:

Ten out of every 1,000 women at age 40 who participate in routine screening have breast cancer. Eight out of these 10 women with breast cancer will get a positive mammography. Of the 990 women without breast cancer, 95 will also get a positive mammography.

Here is a new representative sample of women at age 40 who got a positive mammography in routine screening. How many of these women do you expect actually to have breast cancer? ___ out of ____

Ulrich Hoffrage and I have given 15 problems of this kind (concerning cancer , HIV, pregnancy, and other everyday matters) to students who had never heard of Bayesian inference, using various frequency and probability representations (but no visual aids such as the tree in Figure 1). When information was represented in terms of frequencies, in 46% of cases students found the exact numerical answer and used a Bayesian algorithm, as revealed by protocols. The corresponding value when information was represented in terms of probabilities was only 16% (for details see Gigerenzer & Hoffrage, 1995). In a second study, we tested whether frequency formats improve Bayesian reasoning in physicians, using four medical problems, including mammography. Forty-eight physicians (mean professional experience was 14 years) worked an average of 30 minutes on these problems. Despite the fact that these physicians were experts, the results were similar. When information was represented in a probability format, in only 10% of the cases did the physicians reason the Bayesian way, but when the information was in a frequency format, Bayesian responses increased to 46% (for details see Gigerenzer, 1996a; Hoffrage & Gigerenzer, in press). These results are consistent with the claim that cognitive algorithms are tuned to frequency formats (as defined by the tree in Figure 1).[2] The practical consequences are straightforward: Physicians, patients, and students should be taught to transform probability formats into frequency formats, in which they can "see" the solutions to diagnostic problems. We have designed such a computerized tutorial program that teaches people how to represent probabilities in frequency formats. Students using this tutorial scored about twice as high as those who used a traditional program that taught them how to insert probabilities into Bayes' rule. Five weeks later, students who had learned to construct frequency representations still maintained their high level of accuracy, but the others showed the usual steep forgetting curve (Sedlmeier & Gigerenzer, 1996). It is easier to be a Bayesian when working with frequencies.

To sum up: Bayes's theorem is often used as a norm for rational reasoning, but this rule is mute about the representation of information it is supposed to work on. If evolution has shaped mental algorithms that make inferences about an uncertain world, then it is likely that these algorithms were designed for event frequencies, as encoded by natural sampling, and not for probabilities and percentages. Comparing human judgment to Bayes's theorem without considering the representation of the numerical information is, according to this argument, like comparing the outputs of a pocket calculator to multiplication tables without considering whether the numbers were entered in Arabic numerals, binary numerals, or in another representation. Challenge One is to come up with normative models for human reasoning that deal with algorithms and the input representations on which the algorithms are designed to operate. Rules per se are incomplete normative models for machine and mental computers alike.

2. Challenge Two: Psychological Mechanisms Determine the Relevant Numbers

So far we have linked algorithms to the representation of numerical information but have not thought about the numerical information itself. Now it is time to put some psychology into the numbers. I illustrate this by summarizing Birnbaum's (1983) application of a standard psychological theory, the theory of signal detectability (TSD), to a Bayesian inference problem. TSD is formally equivalent to the Neyman-Pearson theory of hypotheses testing (Gigerenzer & Murray, 1987). The important point is that TSD can direct our attention to psychological mechanisms, which in turn determine the relevant numbers that should be inserted into Bayes's theorem.

2.1 The Norm

The following version of the cab problem is from Tversky and Kahneman (1980, p. 62):

A cab was involved in a hit-and-run accident at night. Two cab companies, the Green and the Blue, operate in the city. You are given the following data:

(i) 85% of the cabs in the city are Green and 15% are Blue.

(ii) A witness identified the cab as a Blue cab. The court tested his ability to identify cabs under the appropriate visibility conditions. When presented with a sample of cabs (half of which were Blue and half of which were Green) the witness made correct identifications in 80% of the cases and erred in 20% of the cases.

Question: What is the probability that the cab involved in the accident was Blue rather than Green?

Tversky and Kahneman assumed that the cab problem has one and only one correct answer, which is obtained by inserting the given numbers into Bayes's theorem in the form of Equation 3. Let G and B stand for the two hypotheses ("Cab was Green" and "Cab was Blue"), and "B" for "Witness testified that cab was Blue." Inserting the numbers into Bayes's theorem results in the following probability p(B|"B") that the cab involved in the accident was Blue:

 

   
    ggpcfhodr04.gif (1223 Byte)    
   

 

The result is .41. Tversky and Kahneman (1980) reported that the modal and median response of several hundred subjects was .80. The median response was identical to the witness's hit rate -- much as in the mammography problem. This result has been interpreted as evidence that subjects neglect base rates.

Tversky and Kahneman's use of Bayes's theorem assumes that the content of the problem is merely decorative -- for instance, what we know about the cognitive mechanisms of eyewitnesses in visual discrimination tasks is assumed to be not relevant for a normative model. In this content-independent application of Bayes's theorem, there is no need to distinguish between a mammography test and an eyewitness report, except for the numbers. Now, let us have a second look at the norm.

2.2 Determining the Relevant Numbers: Psychological Assumptions

Figure 2a illustrates the cab problem from the point of view of the theory of signal detectability (TSD), a theory of sensory discrimination and detection (Birnbaum, 1983). TSD assumes that each color, G and B, produces a normal distribution of sensory values on a sensory continuum (although other distributions are possible). The two distributions overlap, which is why errors in identification can occur. There is a decision criterion that balances the probabilities of the two possible errors a witness can make, the probability p("B"|G) of a false alarm, and the probability p("G"|B) of a miss. (The complement of the miss rate is the probability p("B"|B), called the hit rate.) If on some occasion the value on the sensory continuum is to the right of the criterion, the witness says "Blue"; otherwise, the witness says "Green." If it is important to reduce the probability of false alarms, then the criterion is shifted to the right, causing the probability of misses to increase. The reverse follows if the criterion is shifted to the left. As one shifts the criterion from the very left side of the sensory continuum to the very right side, one gets a series of pairs of hit and false alarm rates, one of which is shown in Figure 2a. The distance between the means of the two distributions is known as the sensitivity d' of the witness.

 

   
    ggspyhand09.gif (9236 Byte)    
   

 

When we look at the cab problem from the point of view of TSD, we notice two key differences between Birnbaum's content-based model and Tversky and Kahneman's content-free model. The first is the decision criterion, which is central to TSD and is absent in the content-free norm. In the content-free approach, the witness is characterized by a single pair of likelihoods (a false alarm and a miss rate), whereas in TSD the witness is characterized by a continuum of such pairs. The second difference is linked to the first: no prior probabilities or base rates are explicit in Neyman-Pearson theory, and consequently, in TSD. However, TSD allows for shifting the decision criterion in response to a shift in base rates, consistent with the empirical finding that the ratio of the hit rate to the false alarm rate varies with the signal probabilities, that is, with the base rates (Birnbaum, 1983; Luce, 1980). Note that this finding is inconsistent with the independence of base rates and likelihoods assumed in the content-free norm. According to TSD, a change in base rates can be manifested as a change in the criterion (the likelihood ratio). According to the content-free norm, in contrast, a change in base rates does not affect the likelihood ratio.

With this background, we can now address the question: What are the relevant numbers to be inserted into Bayes's theorem? TSD suggests that some of the details in the Cab problem are relevant for finding these numbers -- whereas the entire content was irrelevant to the way the normative answer of .41 was calculated. Remember that there were two points in time: the night of the accident and the time when the court tested the witness. If the criterion was set at c0 at the time of the test, where was it set at the critical time of the accident? We are told that the visibility conditions of the test were appropriate; thus we can assume that the sensitivity d' of the witness was similar during the test and on the night of the accident. That is, on the night of the accident, the distance between the means of the two distributions was the same. But where was the criterion? To answer this, we need a psychological theory of criterion shift.

In the absence of further information, we may start with the plausible hypothesis that the witness adjusted his criterion so as to minimize incorrect testimony. During the test, when the base rates of Green and Blue cabs were equal, the criterion (c0) was at the intersection of the two curves in Figure 2a. Now it becomes clear how crucial it is to know whether or not the witness knew the base rates. Assume that the witness knew the base rates of cabs in the city and attempted to minimize incorrect testimony. This implies that on the night of the accident the criterion was to the right of c0 because there were many more Green cabs, and the most likely error was to mistake a Green cab as Blue. The criterion that minimizes the sum of the overall proportion of errors is called c1 in Figure 2b. It is defined by a false alarm rate of .03 and a hit rate of .43 (Birnbaum, 1983).[3] From the assumption that on the night of the accident the witness set his criterion at c1 rather than c0, these values are the relevant numbers to be inserted into Bayes's theorem:

 

   
    ggpcfhodr05.gif (616 Byte)    
   

The result is .72. Note that this result could be mistaken as an instance of base-rate neglect because it is again close to the hit rate (80% in the text of the cab problem). Ironically, the value of .72 was computed based on the assumption that the witness knows and uses the base rates. The TSD analysis of the cab problem illustrates how psychological assumptions (e.g., the criterion setting) and mathematical assumptions (e.g., the identical normal distributions of the perceptual processes) go hand in hand in a content-sensitive normative model. Good statistical reasoning cannot be reduced to the mechanical insertion of numbers into a formula, an insight that the intellectual parents of TSD, Jerzy Neyman and Egon S. Pearson, emphasized repeatedly (Gigerenzer, 1993).

If the witness adjusts the criterion in a different way than minimizing incorrect testimony, this will lead to a different posterior probability. Birnbaum (1983) has studied various psychological strategies a witness might use. I should mention that the criterion shift is not limited to situations in which the base rates at the time of the accident and at the test differ. Even if the base rates were identical, the witness who testified "Blue" knows that he can be accused of making one and only one error, that is, of saying "Blue" although the cab was Green (a false alarm). The other possible error, a miss (mistaking a blue cab for a green one) is excluded, because he testified that the cab was blue. If he wants to protect himself from being accused of erroneous testimony, he may shift the criterion far to the right so that the probability of a false alarm is minimized. Shifting the criterion to the right also increases the posterior probability (this can be inferred from Figure 2a).

Challenge Two is to build normative models from psychological assumptions, rather than to insert numbers into a formula, purified of the content of the situation. The fundamental normative role of the assumptions a person makes is not peculiar to the cab problem; for instance, it is crucial for the normative evaluation of the three-doors problem (Falk, 1992), the four-cards problem (Gigerenzer & Hug, 1992; Oaksford & Chater, 1994), gambler's fallacy, and "conservatism" in information processing (Cohen, 1982).

To emphasize psychology is emphatically not to say that "anything goes." The contrast I wish to draw is not between a norm that is created mechanically in a content-free way and no norms at all. My point is that psychological assumptions (the semantics and pragmatics of the situation) are indispensable for constructing a sensible norm. The particular assumptions that are made about a situation determine the choice among possible candidates for a normative model. A consequence is that claims of the kind "this is the only correct answer" need to be based on fleshing out the psychological assumptions (Levi, 1983). The cab problem is of particular interest here because it illustrates how two different statistical approaches, the Neyman-Pearson theory of hypotheses testing (which is formally equivalent with TSD) and (the content-free application of) Bayes's theorem, can highlight different aspects of the problem as important, such as the decision criterion of the witness.

Challenge One adds to Challenge Two. When the information in the cab problem is represented in absolute frequencies as opposed to probabilities or percentages, people can "see" the numerical answer much more easily, whatever numbers they chose as relevant (Gigerenzer & Hoffrage, 1995).

3. Challenge Three: The Indeterminacy of Consistency

The take-home message so far is that modeling rational judgment involves (a) assumptions about the information representation for which cognitive algorithms are designed, and (b) assumptions about psychological mechanisms that determine which numbers (prior probabilities, likelihoods) enter an algorithm. Let us now extend our discussion of the role of content in defining sound reasoning and turn to internal consistency of choice. Internal consistency is often seen as the requirement of rational choice in decision theory, behavioral economics, and game theory. Challenge Three is to define consistency in terms of something external to the choice behavior, such as social objectives and values, rather than in terms of content-independent formulations (axioms). Only then can we decide whether a behavior is actually consistent or not.

3.1 The Norm: Property Alpha

One basic condition of internal consistency of choice is known as "Property Alpha," also called the "Chernoff condition" and "independence of irrelevant alternatives" (Sen, 1993). The symbols S and T denote two nonempty sets of alternatives, and x(S) that alternative x is chosen from the set S.

Property Alpha:

   
    ggspyhand11.gif (1284 Byte)    
   

 

Property Alpha demands that if x is chosen from S, and x belongs to a subset T of S, then x must be chosen from T as well. For instance, assume you won a free subscription to any weekly magazine in the world (S) of your choice. You choose the Economist (x). Now you learn that you can actually only choose a weekly magazine published in English (T). You still choose the Economist. The following two choices would be inconsistent in that they violate Property Alpha:

1. x is chosen given the options {x, y}

2. y is chosen given the options {x, y, z}

Property Alpha is violated because x is chosen when the two alternatives {x, y} are offered, but y is chosen when z is added to the menu. (Choosing x is interpreted here as a rejection of y, not as a choice that results from mere indifference.) It may indeed appear odd and irrational that someone who chooses x and rejects y when offered the choice set {x, y} would choose y and reject x when offered the set {x, y, z}. Such violations are known as preference reversals. For illustration, here is a story told about the Columbia University philosopher Sidney Morgenbesser. Sidney went to the donut store on 116th Street. "Would you like a plain or a glazed donut?" the waitress asked. "I'll have a plain donut," responded Sidney. "Oh, I forgot, we also have a jelly donut," the waitress added. "In this case," Sidney replied, "I'll take the glazed donut." My philosopher friends laugh at this story: Sidney has violated Property Alpha.

3.2 Psychologizing the Norm: Making Consistency Work

Sen (1993) has launched a forceful attack on internal consistency as defined by Property Alpha and similar principles, and what follows is based on his ideas and examples. Property Alpha formulates consistency exclusively in terms of the internal consistency of choice behavior with respect to sets of alternatives. No reference is made to anything external to choice, for instance, to intentional states such as a person's social objectives, values, and motivations. This exclusion of everything psychological beyond behavior is in line with Samuelson's (1938) program of freeing theories of behavior from any traces of utility and from the priority of the notion of "preference."

But consider Property Alpha in the context of social politics at a dinner party. Everyone makes his or her way through the main course and conversation. Finally, a fruit basket is passed around for dessert. When the basket reaches Mr. Polite, there is only one apple left. Mr. Polite has the choice of taking nothing (x) or taking the apple (y). Mr. Polite loves apples, but because there is only one left he decides to behave decently and take nothing (x), because this would deprive the next person from having a choice. If the basket had contained another piece of fruit (z), he could have chosen y over x without violating standards of good behavior. Choosing x over y from the choice set {x, y} and choosing y over x from the choice set {x, y, z} violates Property Alpha, even though there is nothing irrational about Mr. Polite's behavior given his values regarding social interaction. If he had not held to such values of politeness in company, or had chosen to dine alone, then Property Alpha would not have been violated. It is social values that determine what the perceived alternatives in the choice set are: For the selfish person it is apple versus nothing in both choice sets, but for Mr. Polite it is last apple versus nothing in the first set. Property Alpha tells us little about consistency unless one looks beyond choice behavior to a person's intentions and values.

Sidney Morgenbesser's reversal of preference looks irrational. However, consider the following. I grew up in Bavaria and I love roasted pork with potato dumplings. In a restaurant in Illinois I once had a choice between roasted pork and steak, and I chose the steak over roasted pork (from bitter experience). But when the waiter added "It's not on the menu, but we also have blood-and-liver sausages with sauerkraut," then I switched and chose roasted pork over steak. The third alternative, although I did not choose it, indicated by its very existence that this restaurant's cook might really know how to make Bavarian roasted pork with potato dumplings. Again, choosing x over y from the choice set {x, y} and choosing y over x from the choice set {x, y, z} violates Property Alpha, even though there is nothing irrational about this behavior. The mere emergence of a new alternative may carry information about the previous alternatives.

To summarize the argument: Consistency, as defined by Property Alpha, deals only with choice behavior. However, consistency in observed choice can be a poor indication of consistency, as the examples illustrate (for more see Gigerenzer, 1996b; Sen, 1993). Only once a person's social values, objectives, and expectations are known can axioms such as Property Alpha capture consistency. Challenge Three is to develop concepts of consistency that are not merely syntactical and leave out semantics and pragmatics, but start from psychological entities such as a person's expectations and social values.

4. Challenge Four: Semantic Inferences

Challenges Two and Three emphasized the role of content in building normative models. So far I have dealt with content that did not look relevant from the point of view of content-free normative models. In this view, the relevant information is assumed to be reducible to those words in a problem description that sound similar to concepts in logic and probability theory, such as "AND," "OR," "probable," and "likely." In this section, I deal with these key terms. Unlike logic and probability theory, natural languages are polysemous, and the meaning of these terms must be inferred from the content in which they occur. Challenge Four is to analyze the semantic inferences people make about the meaning of terms, and to judge the soundness of a person's reasoning on the basis of these inferences, rather than assuming that natural language terms map one-to-one into similar-sounding concepts in probability theory and logic.

4.1 The Norm

Consider the following, known as the Linda problem (Tversky & Kahneman, 1983):

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

Which of these two alternatives is more probable?

Linda is a bank teller. (T)

Linda is a bank teller and active in the feminist movement. (T&F)

In numerous experiments, a majority of subjects (often 80% to 90%) chose T&F as more probable. Tversky and Kahneman (1983) argued that this choice is an error in reasoning: T&F is the conjunction of two propositions, namely that Linda is a bank teller (T) and that she is active in the feminist movement (F), whereas T is one of the conjuncts. The mathematical probability of a conjunction cannot be greater than that of one of its conjuncts -- this rule has been referred to as the conjunction rule:

p(T&F) = p(T)                        (7)

Tversky and Kahneman argued that because of this rule, the correct answer to the problem is T. They therefore concluded that the majority of their subjects, who chose T&F, had committed a reasoning error they called the "conjunction fallacy." The explanation of the phenomenon was that people do not reason by the laws of probability, and instead use similarity to judge probability, a strategy termed the "representativeness heuristic." The Linda problem has been used by other researchers to draw hefty conclusions about human rationality, for instance, that "our minds are not built (for whatever reason) to work by the rules of probability" (Gould, 1992).

4.2 Psychologizing the Norm: Semantic Inferences

This use of the conjunction rule as a normative model for the Linda problem (and other similar problems) assumes that (a) all that counts for rational reasoning are the English terms "and" (in "T and F") and "more probable," and (b) these natural language terms can be mapped in a one-to-one fashion into logic and probability theory: The English "and" is assumed to be immediately translatable onto the logical "AND," and the English "probable" onto mathematical probability. Everything else, including Linda's description and the content of the two propositions, is considered irrelevant for sound reasoning.

The critical point here is that this one-to-one mapping from natural language to logic or probability theory cannot capture sound reasoning. All natural languages embody polysemy. For example, connectives such as "and," "or," and "if" have several meanings, such as the inclusive and exclusive meanings of "or." Consider the proposition "Joan and Jim married and they had a baby" versus "Joan and Jim had a baby and they married." If one mapped the "and" in these two sentences onto the logical "AND," one would miss the difference between the information communicated in the first versus the second sentence. Not mapping the natural language "and" onto the logical operator allows us immediately to understand that the "and" in these sentences indicates temporal order, and thus unlike logical AND, is not commutative. The cognitive mechanism that infers the meaning of terms such as "and" from the content of a sentence is a most impressive feature of the human mind; no computer program exists yet that can make these inferences. I refer to this mapping process as "semantic inference."

The polysemy of the English term "and" makes it important to find out which meaning of "and" a person infers when reading the proposition "Linda is a bank teller and active in the feminist movement." There is a good reason to interpret "and" as something other than logical AND: The logical meaning would render the description of Linda and the content of the two alternatives irrelevant, thus violating the conversational maxim of relevance, that is, the assumption that what the experimenter tells you is relevant to the task (Adler, 1991; Grice, 1975). Polysemy also holds for the English term "probable." The Oxford English Dictionary lists a wide range of legitimate meanings for "probable," including "plausible," "having an appearance of truth," "that may in view of present evidence be reasonably expected to happen," and "likely," among others. Most of these meanings cannot be mapped onto the mathematical concept of probability. For instance, if "Which is more probable?" is understood as "Which makes a more plausible story?" or "Which is supported by evidence?" then T&F seems to be the better alternative. There exist a number of studies indicating that people indeed draw semantic inferences that lie outside logic and probability theory, such as the "T -> T & not F" implicature (that is, to infer that "Linda is a bank teller" means "Linda is a bank teller and not active in the feminist movement") and the "T&F -> F given T" implicature (that is, to infer that "Linda is a bank teller and active in the feminist movement" means "Linda is a feminist given she is a bank teller"; see e.g., Dulany & Hilton, 1991; Hertwig & Gigerenzer, 1996; Tversky & Kahneman, 1983). These implicatures render the description of Linda relevant to the problem.

There are experiments that indicate that the reason why many people chose T&F as more probable is their outstanding ability to perform semantic inferences rather than their alleged failure to reason according to the laws of probability. One experiment simply couches the problem in terms of frequency rather than subjective degree of belief, replacing the ambiguous term "probable" by the less ambiguous term "how many" (Hertwig & Gigerenzer, 1996). This version of the problem informs the subject that there are 200 women who fit Linda's description. Subjects are then asked

How many of the 200 women are bank tellers? ___ of 200

How many of the 200 women are active in the feminist movement? ___ of 200

How many of the 200 women are bank tellers and are active in the feminist movement? ___ of 200

In this "frequency version" of the Linda problem, violations of the conjunction rule dropped to 13% (from 87% in the original "probability version"). Substituting "how many" for the ambiguous term "probable" is, to the best of my knowledge, the strongest and most consistent way to reduce the conjunction fallacy (see Fiedler, 1988; Hertwig, 1995; and Tversky & Kahneman, 1983, p. 309, for similar results). Most subjects reason according to the conjunction rule when linguistic ambiguity is resolved.

To summarize: The use of the conjunction rule as a content-free norm for correct thinking overlooks the capacity of the human mind to make semantic inferences. Challenge Four is to model these impressive semantic inferences, rather than to assume as normative a one-to-one mapping of natural language terms into probability theory and logic.

5. Challenge Five: Double Standards

Normative models are sometimes used by convention rather than by reflection. Unreflective use of norms can lead to double standards. For example, the same researcher sometimes uses two mutually inconsistent norms, one prescribing what is rational inference for subjects and another prescribing what is rational inference for the researcher. Each norm is used mechanically, without consideration of content.

5.1 Double Standards: Fisherian Norms for Me, Bayesian Norms for You

R. A. Fisher's The Design of Experiments (1935) is possibly the single most influential book on experimental methodology in the social and biological sciences. Fisher disapproved of the routine application of Bayes's theorem. In the introduction, Fisher congratulates the Reverend Thomas Bayes for being so critical of this theorem as to withhold its publication (Bayes's 1763 treatise was published only after Bayes died). Fisher thought that the preconditions for applying Bayes's theorem, such as an objective prior distribution over the set of possible hypotheses, rarely hold, and that routine applications of the theorem would lead to unacceptable subjectivism wherein the strength of evidence would be just a matter of taste. In his book, Fisher successfully sold researchers his method of null hypothesis testing instead. By the 1950s, null hypothesis testing, also known as significance testing, became institutionalized in many social, biological, and medical fields as the sine qua non of scientific inference.

What was institutionalized was actually a mishmash between Fisher's null hypothesis testing and some concepts of a theory that Fisher deeply disliked, namely Neyman and Pearson's theory of hypotheses testing. Textbooks and curricula are generally silent about the fact that they teach a hybrid creature that would have been rejected by both camps (Gigerenzer, 1987, 1993; Gigerenzer et al, 1989, chaps. 3 and 6).

In the 1960s, Ward Edwards and colleagues proposed that researchers (a) abandon null hypothesis testing and turn Bayesian instead (Edwards, Lindman, & Savage, 1963), and (b) study whether the untutored mind reasons by Bayesian principles (Edwards, 1968). The first proposal fell stillborn from the press while the second became a raging success. Researchers began to test whether the mind draws inferences according to Bayes's theorem (as described in Challenges One and Two) at the same time that they continued to use significance testing. Researchers had been taught to use significance testing (which promised objectivity) mechanically, and Bayes's theorem smelled of subjectivity (Gigerenzer, 1987). Thus those who went only half-way with Ward Edwards unwittingly committed themselves to a double standard. Ordinary people who do not make inferences according to Bayes's theorem are branded irrational, but the researchers who brand them do not apply the same standard to their own inductive inferences. They use significance testing, not Bayes's theorem, to infer whether people are Bayesians or not.

5.2 Beyond Double Standards

Challenge Five is to construct normative models (for the reasoning of researchers and their experimental subjects alike) in a thoughtful rather than a mechanical way. One may end up having to tailor different normative models for different situations, but not mechanically use one norm for experimenters and another for subjects. I and others have traced how the mindless use of null hypothesis testing in psychology (and many social and medical sciences) became institutionalized in textbooks, curricula, and editorial practices (e.g., Gigerenzer, 1993; Gigerenzer & Murray, 1987, chap. 1; Gigerenzer et al., 1989, chaps. 3 and 6). Remember that Hume's problem of inductive inference has not yet been solved; there is no single method that works in all situations, and we need to teach students and researchers what the methods are and how to choose between them. In my opinion, inferential statistics -- significance testing, Neyman-Pearson testing, Bayesian statistics, and so on -- are rarely needed in research. What is needed is good descriptive statistics, knowledge of the data (e.g., look at the scatterdiagram instead of just at the correlation coefficient), adequate representations of the data, and the formulation of precise alternative hypotheses instead of a single null hypothesis. There is hope on the horizon that after four decades, the reign of the null hypothesis testing ritual is at last in decline. For instance, Geoffrey Loftus, editor of Memory & Cognition, seems to be the first editor of a major psychology journal in the United States to speak out and explicitly discourage researchers from mechanically submitting p, F, and t-values for no good reason (Loftus, 1991, 1993). He asked researchers instead to provide good descriptive statistics and to think about the representation of information, for example, to provide figures with error bars instead of p values.

When one thinks of statistical models as models of some situation rather than mechanically applicable tools, then double standards can be avoided. To come up with reasonable normative models of inference, one must try to model the situation instead of imposing content-independent standards onto the reasoning of either subjects or experimenters.

6. Challenge Six: The Power of Simple Psychological Mechanisms

In the first four challenges, I argued that syntactical rules or axioms are not sufficient to define rational behavior, unless they take psychological mechanisms into account. These were standard rules and axioms, which are taken by many to define rationality in a content-free way. In this section, I invite you to look beyond standard rules and axioms to the power of simple, "satisficing" psychological mechanisms that violate classical assumptions of rationality. Challenge Six is to design simple satisficing mechanisms that work well under real-world constraints of limited time and knowledge: mechanisms that are fast and frugal but nevertheless about as accurate as computationally "expensive" statistical models that satisfy classical norms.

6.1 The Norm

Imagine that you have to infer which of two alternatives, a or b, has a higher value on some criterion, and there are 10 predictors of the criterion with different validities. One method that is used to make such an inference is multiple regression, which computes the beta weights for each of the predictors, computes the value of each alternative on the criterion, and chooses the alternative that scores higher. This amounts to formulating the following multiple regression equation:

 

   
    ggpcfhodr06.gif (627 Byte)    
   

 

where ya is the value of a on the criterion, xa1 is the value of alternative a on predictor 1, ß1 is the optimal beta weight for predictor 1, and so on.

Note that we are now dealing with a much more complex situation than in Challenges One and Two: There are many pieces of information (the predictors) rather than only one (e.g., positive mammography), and these may be partially redundant. The two norms dealt with here are more general than the multiple regression model. The first norm is that sound inference implies complete search, that is, taking account of all pieces of information available, and the second requires complete integration, that is, integrating all pieces of information in some optimal way (Gigerenzer & Goldstein, in press). These norms hold for multiple regression as well as for Bayesian inference and neural networks, all of which look up and integrate all available information.

6.2 Beyond Complete Search and Integration: Take the Best

Humans often need to make inferences about aspects of their environment under constraints of limited time, limited knowledge, and limited computational resources. The linear multiple regression norm is in conflict with all three. When one is driving fast and the road suddenly forks, one does not have the time to think about all the reasons that would favor going right or left, nor the knowledge and computational aides to calculate all the beta weights, multiply these with the predictor values, and calculate sums. Similarly, a doctor in an emergency room who has to make a decision whether a heart attack patient should be treated as a high risk or a low risk case does not know the values of the patient on all relevant predictors, nor can she always take the time to measure these. In these and many other situations, humans have to rely on fast and frugal psychological mechanisms rather than on multiple regression. In Herbert Simon's terms, humans "satisfice" rather than "optimize." Consider the following demographic problem:

Which city has more inhabitants:

(a) Hannover

(b) Bielefeld

Assume you do not know the answer, but have to make an inference. There are many predictors (cues) that signal larger population, such as whether or not a city has a soccer team in the major German league ("Bundesliga"), whether or not it is a state capital, has a university, and so on. Thus, according to the norms of complete search and information integration, one should search in memory for all predictors, estimate the values of the two cities on those predictors, estimate the beta weights for each predictor, multiply these with the estimated values, sum the products up and choose the city with the higher value. Such a model assumes that the mind is a Laplacean Demon with unlimited knowledge and computational resources. What is the alternative?

My students and I have developed a family of satisficing algorithms (Gigerenzer, Hoffrage, & Kleinbölting, 1991; Gigerenzer & Goldstein, in press), one of which I will describe here. It is based on psychological mechanisms that a mind can utilize given limited time and knowledge. One of these simple mechanisms, the "recognition principle," says that if one has heard of city a but not of city b, then the search for further information can be stopped, and the inference that a is the larger can be made. Thus, in the example problem, if you have never heard of Bielefeld, you will infer that Hannover has more inhabitants. The recognition principle can be invoked when there is a correlation between recognition and the criterion. For instance, advertisement companies (e.g., Benetton) exploit this principle by making sure that consumers recognize the brand name while providing no information about the product itself (Goldstein & Gigerenzer, 1996).

If recognition cannot be used as a cue, that is, if someone has heard of both cities, a second mechanism is invoked: one-reason decision making, or "one good reasoning," for short. For instance, if the fact is retrieved from memory that Hannover has a soccer team in the major league but Bielefeld does not (or one does not know), then "one good reasoning" makes the inference that Hannover is the larger city. No further information is sought. What we call the Take the Best algorithm (because of its motto "take the best and ignore the rest") is based on just these two psychological principles and the assumption of a subjective ranking of the predictors in terms of their validities. The flow chart of the Take the Best algorithm is shown in Figure 3. For simplicity, only binary predictors are considered, and the values +, -, and ? signify that an object has a positive, negative, or unknown value on a predictor. The key features of Take the Best are (a) limited search, that is, it stops with the first predictor (including recognition) that discriminates between two objects, and (b) no integration, that is, choice is made on the basis of only one cue (but this cue may be different from one pair of cities to the next). These two features violate the norms of exhaustive search and integration.

 

   
    ggspyhand10.gif (6940 Byte)    
   

 

Well, Mr. Optimal says, people may use Take the Best or similar satisficing algorithms, given the constraints of limited time and knowledge, but this is certainly a non-optimal, quick-and-dirty algorithm. Look, Mr. Satisficing responds, studies on the "flat maximum" have shown that in real-world environments, the "optimal" beta weights (in Equation 8) may not lead to better predictions than unit weights (+1 or -1), suggesting that the world can be predicted as well with simpler algorithms (Dawes, 1979; Lovie & Lovie, 1986). Well, Mr. Optimal replies, unit-weight models use simpler weights, but outside of that they do not violate the classical norms of rationality: They look up all information available and integrate it, whereas Take the Best does neither. Thus, it will make lousy inferences. How do you know? asks Mr. Satisficing; let us empirically determine, in a real-world environment, how much better algorithms that obey the two norms actually perform, compared to the satisficing Take the Best algorithm.

To answer this question, Dan Goldstein and I conducted a competition between five integration algorithms (including multiple regression and unit-weight linear models) and the Take the Best algorithm. The algorithms inferred which of a pair of cities was the larger one, as in the Bielefeld-Hannover example, and the criteria used were the speed and the accuracy of the inferences (Gigerenzer & Goldstein, in press). Inferences were made about all (pairs of) cities in Germany (after reunification) with more than 100,000 inhabitants (there were 83 cities). We simulated subjects with varying degrees of limited knowledge, ranging from those who did not recognize a single German city, and consequently did not have any information (values on predictors for population) about these cities, to subjects who recognized all 83 cities and knew all values of these cities on all predictors (cues). There were 10 predictors, including the aforementioned soccer team, state capital, and university cues. The simulation included 84 ( number of cities recognized, from 0 to 83) times 6 (proportion of cue values known for the cities recognized: 0%,10%, 20%, 50%, 75%, 100%) types of subjects, and within each type we used 500 individual subjects that differed randomly in the particular cities and cue values known. Each of these 84 x 6 x 500 simulated subjects drew inferences about all pairs of cities (as in the Hannover-Bielefeld example) using six algorithms (one at a time). The algorithms were five integration algorithms that looked up all information (including multiple regression) and the Take the Best algorithm. In the competition we measured the speed (proportion of cue values searched before making an inference) and accuracy (proportion of correct inferences). When simulated subjects made their inference with the Take the Best algorithm, they searched for only 30% of the information that the integration algorithms used, thus outperforming all other contestants in speed. After all, speed and computational simplicity is what this algorithm is designed for. But how accurate were the inferences that Take the Best drew? The striking result was that Take the Best matched one of the competitors in accuracy and outperformed the remaining four, multiple regression included (for details see Gigerenzer & Goldstein, in press).

This result is an existence proof that cognitive mechanisms capable of successful performance in a real-world environment need not satisfy classical norms of rational inference: Exhaustive information search and integration may be sufficient but not necessary for a mind capable of sound reasoning. There is independent evidence that "one good reasoning," as demonstrated by the Take the Best algorithm, can classify heart attack patients into high and low risk groups as well as or better than standard statistical integration models with many valid predictors (Breiman, Friedman, Olshen, & Stone, 1993). One important question concerning norms turns out to be ecological: What are the structures of real-world environments that can be exploited by simple cognitive mechanisms, and how can we talk about these structures?

The result of this competition defeats the widespread view that only "optimal" algorithms--ones that search and integrate all information--can be accurate. Models of inference do not have to forsake accuracy for simplicity, or rationality for psychological plausibility. Challenge Six is to design psychologically plausible models of sound inference that can operate under constraints of limited time and knowledge. Reasoning can be rational and psychological.

7. A Psychological Approach to Norms

I started out with the opposition between the rational and the psychological: Rational judgment is defined by the laws of probability and logic, and only by these. Psychology does not come in until things go wrong, that is, when people's judgments deviate from the laws of probability and logic. In contrast, I argued that psychological principles are indispensable for defining and evaluating what sound judgment is. Axioms and rules from probability theory and logic are, by themselves, indeterminate. In particular, I discussed the role of the representation of numbers, the role of content for inferring what the relevant numbers are, and the role of a person's social values, motives, and expectations in defining and evaluating norms for sound judgment.

 

Author's Note

I am grateful to Phil Blythe, Valerie Chase, Jean Czerlinsky, Dan Goldstein, Ralph Hertwig, Alejandro Lopes, Geoffrey Miller, Anita Todd, and Peter Todd for their critical comments on earlier versions of this chapter. Special thanks go to Berna Eden, Ulrich Hoffrage, and Laura Martignon. Arnold Davidson told me the donut story.

 

   

 

 

   

Footnotes

[1] Daniel Bernoulli's psychological solution was not the only one, and there exists a large literature on the St. Petersburg Paradox (e.g., Daston, 1988; Jorland, 1987; Lopes, 1981).
[2] Note that this result applies to the simple type of Bayesian inference with binary hypotheses and data and to one piece of information (e.g., one test result). In situations with multiple pieces of information that are not independent but redundant, however, Bayes's theorem quickly becomes mathematically complex and computationally intractable--at least for an ordinary human mind. In these situations, even frequency representations may not be able to reduce the complexity sufficiently to enable minds to "see" the Bayesian way. In Challenge Six, I will deal with such complex situations and present evidence that simple psychological mechanisms can make inferences as accurate as sophisticated statistical models that use large amounts of knowledge.
[3] The false alarm rate p("B"|G) and the hit rate p("B"|B), which minimize the error .85p("B"|G) + .15p("G"|B), are calculated as follows. First, d' is determined from the test situation. Here we know that the base rates of Green and Blue were the same, and we can assume that the two errors p("B"|G) and p("G"|B) were both equal to .20. Assuming that the two distributions are normal distributions with variance 1.0, we can find the difference (c0 - g) (Figure 2a) using the fact that the cumulative distribution function F(x) of the standard normal distribution takes the value .8 at x = (c0 - g). Thus, (c0 - g) = .84. From the symmetry of the test situation, we can conclude that the difference (b - c0) is also equal to .84. Because d' = b - g = (b - c0) + (c0 - g), d' is equal to 1.68.
Now for the situation on the night of the accident where the base rates of Blue and Green are different, we use this value of d' to find the value of the criterion c1, which determines the minimum value of the error .85p("B"|G) + .15p("G"|B). Notice that p("B"|G) = 1 - F(c1 - g), and p("G"|B) = F(c1 - b), where F is the area under the cumulative normal distribution. So, the error to be minimized can be recast as .85(1 - F(c1 - g)) + .15(F(c1 - b). We use standard techniques of differential calculus to minimize this expression. Its derivative is ggpcfhodr07.gif (632 Byte),
 

which is zero only at c1 = g + 1.87. This point corresponds both to the minimal value of the error and the intersection of the two curves in Figure 2b. With this value of c1, the false alarm rate is equal to p("B"|G) = 1 - F(c1 - g) = 1 - F(1.87) = .03, and p("G"|B) = F(c1 - b) = F(g + 1.87 - b) = F(1.87 - d') = F(1.87 - 1.68) = F(.19) = .57. Therefore the hit rate p("B"|B) is 1 - .57 = .43.

Note that Birnbaum (1983) has reported a slightly different value, which seems to be based on a calculation error.

 

References

Adler, J. (1991). An optimist's pessimism: Conversation and conjunction. In E. Eells & T. Maruszewski (Eds.), Studies on L. Jonathan Cohen's philosophy of science (pp. 251-282), Amsterdam-Atlanta, GA: Rodopi.

Birnbaum, M. H. (1983). Base rates in Bayesian inference: Signal detection analysis of the cab problem. American Journal of Psychology, 96, 85-94.

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1993). Classification and regression trees. New York: Chapman & Hall.

Casscells, W., Schoenberger, A., & Grayboys, T. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299, 999-1000.

Chase, V. M. (1995). The role of inferred features and statistical reasoning in similarity and probability judgment. Master's thesis, The University of Chicago.

Cohen, L. J. (1982). Are people programmed to commit fallacies? Further thoughts about the interpretation of experimental data on probability judgment. Journal of the Theory of Social Behavior, 12, 251-274.

Cosmides, L., & Tooby, J. (in press). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition.

Daston, L. J. (1981). Mathematics and the moral sciences: The rise and fall of the probability of judgments, 1785 Ō 1840. In H. N. Jahnke & M. Otte (Eds.), Epistemological and social problems of the sciences in the early nineteenth century (pp. 287-309). Dordrecht, Holland: D. Reidel Publishing Company.

Daston, L. J. (1988). Classical probability in the Enlightenment. Princeton, NJ: Princeton University Press.

Daston, L. J. (1992). The doctrine of chances without chance: Determinism, mathematical probability, and quantification in the seventeenth century. In M. J. Nye et al. (Eds.), The invention of physical science (pp. 27-50). Kluwer Academic Publishers.

Dawes, R. M. (1979). The robust beauty of improper linear models. American Psychologist, 34, 571-582.

Dulany, D. E., & Hilton, D. J. (1991). Conversational implicature, conscious representation, and the conjunction fallacy. Social Cognition, 9, 85-110.

Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 249-267). Cambridge: Cambridge University Press.

Edwards, W. (1968). Conservatism in human information processing. In B. Kleinmuntz (Ed.), Formal representation of human judgment (pp. 17-52). New York: Wiley.

Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193-242.

Falk, R. (1992). A closer look at the probabilities of the notorious three prisoners. Cognition, 43, 197-223.

Fiedler, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors. Psychological Research, 50, 123-129.

Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver & Boyd.

Feynman, R. (1967). The character of physical law. Cambridge, MA: MIT Press.

Gigerenzer, G. (1987). Probabilistic thinking and the fight against subjectivity. In L. Krüger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution, Vol. 2. Ideas in the sciences (pp. 11-33). Cambridge, MA: MIT Press.

Gigerenzer, G., & Hug, K. (1992). Domain-specific reasoning: Social contracts, cheating and perspective change. Cognition, 42, 127-171

Gigerenzer, G. (1993). The Superego, the Ego, and the Id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 313-339). Hillsdale, NJ: Erlbaum.

Gigerenzer, G. (1994). Why the distinction between single-event probabilities and frequencies is relevant for psychology (and vice versa). In G. Wright & P. Ayton (Eds.), Subjective Probability (pp. 129-161). New York: Wiley.

Gigerenzer, G. (1996a). The psychology of good judgment: Frequency formats and simple algorithms. Journal of Medical Decision Making, 16, 000-000.

Gigerenzer, G. (1996b). Rationality: Why social context matters. In Baltes, P. & Staudinger, U. M. (Eds.), Interactive minds: Life-span perspectives on the social foundation of cognition (pp. 319-346). Cambridge: Cambridge University Press.

Gigerenzer, G., & Goldstein, D. G. (in press). Reasoning the fast and frugal way: Models for bounded rationality. Psychological Review.

Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684-704.

Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506-528.

Gigerenzer, G., & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge: Cambridge University Press.

Goldstein, D. G., & Gigerenzer, G. (1996). Reasoning by Recognition: How To Exploit a Lack of Knowledge. Unpublished manuscript.

Gould, S. J. (1992). Bully for brontosaurus: Further reflections in natural history. New York: Penguin Books.

Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, III: Speech acts (pp. 41-58). New York: Academic Press.

Hertwig, R. (1995). Why Dr. Gould's homunculus doesn't think like Dr. Gould: The conjunction fallacy reconsidered. Konstanz: Hartung-Gorre.Verlag. (Doctoral dissertation, Universität Konstanz, Germany, 1995)

Hertwig, R., & Gigerenzer, G. (1996). The "conjunction fallacy" revisited: Polysemy, conversational maxims, and frequency judgments. Manuscript. Max Planck Institute for Psychological Research, Munich.

Hoffrage, U. & Gigerenzer, G. (in press). The impact of information representation on Bayesian reasoning. Proceedings of the Cognitive Science Society.

Jorland, G. (1987). The Saint Petersburg Paradox 1713-1937. In L. Krüger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic revolution, Vol. 1. Ideas in the sciences (pp. 157-190). Cambridge, MA: MIT Press.

Kleiter, G. D. (1994). Natural sampling: Rationality without base rates. In G. H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology, psychometrics, and methodology (pp. 375-388). New York: Springer.

Koehler, J. J. (1996). The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges. Behavioral and Brain Sciences, 19, 1-54.

Levi, I. (1983). Who commits the base rate fallacy? Behavioral and Brain Sciences, 6, 502-506.

Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36, 102-104.

Loftus, G. R. (1993). Editorial comment. Memory & Cognition, 21, 1-3.

Lopes, L. L. (1981). Decision making in the short run. Journal of Experimental Psychology: Human Learning and Memory, 7, 377-385.

Lovie, A. D., & Lovie, P. (1986). The flat maximum effect and linear scoring models for prediction. Journal of Forecasting, 5, 159-168.

Luce, R. D. (1980). Comments on the chapters by MacCrimmon, Stanbury and Wehrung, and Schum. In T. S. Wallsten (Ed.), Cognitive processes in choice and decision making. Hillsdale, NJ: Erlbaum.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman.

Oaksford, M. & Chater, N. (1994). A rational analysis of the selection task as optimal data selection. Psychological Review, 101, 608-631.

Samuelson, P. A. (1938). A note on the pure theory of consumers' behavior. Economica, 5, 61-71.

Sedlmeier, P. & Gigerenzer, G. (1996). Teaching Bayesian Reasoning in Less than Two Hours. Manuscript submitted for publication.

Sen, A. (1993). Internal consistency of choice. Econometrica, 61 (3), 495-521.

Tversky, A., & Kahneman, D. (1980). Causal schemata in judgments under uncertainty. In M. Fishbein (Ed.), Progress in social psychology. (Vol. 1, pp. 49-72). Hillsdale, NJ: Erlbaum.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90, 293-315.

 

 

   
         
  Contact Author        
       
    » Home   » The Institute   » Electronic Full Texts   
  Update 6/2001   » webmaster-library(at)mpib-berlin.mpg.de
» ©Copyright