|
|||||||||||
| » Home » The Institute » Electronic Full Texts | |||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Michael R. Waldmann |
|
||||||||||||||||||||||||||||||||||||
| Contact Author |
Abstract |
||||||||||||||||||||||||||||||||||||
|
|
We have conducted a number of experiments which show that this class of models is inadequate for describing causal learning (see Waldmann, 1996, for an overview). Our experiments demonstrated that human learners are indeed sensitive to the causal status of cues and outcomes. In particular, the experiments showed that a predictive learning task in which multiple causes are used to predict a common effect is learned differently from an otherwise identical diagnostic learning task in which multiple effects are used as cues to a common cause. The Temporal Order of Events Constraint The Model Setting up an Initial Causal Model (Step 1) The top-down orientation of our model deviates sharply from the majority of Bayesian network models (e.g. Pearl, 1988; Spirtes, Glymour, & Scheines, 1993). Such models are typically developed as normative tools for statistical analysis, and they often aim at developing strategies to bootstrap causal structures from covariation data in a bottom-up fashion. These methods are not intended to model everyday causal reasoning. On the contrary, they are often motivated by the assumption that causal analysis needs to be guided by expert systems that embody Bayesian strategies. In our view, it is unlikely that human learners are good at inducing the causal relations between several interconnected events solely on the basis of covariation information. Causal models have the potential to dramatically reduce the processing effort during learning. Consider, for instance, the potential effort involved in a domain with three interrelated binary events, entailing dozens of unconditional and conditional frequencies that a learner might decide to focus on (see Pearl, 1988). A domain with three interrelated binary events, for example, entails dozens of unconditional and conditional frequencies that a learner might decide to focus on (see Pearl, 1988). Figure 1 illustrates three different causal models that can be generated by three events. The arrows denote direct causal influences that point from causes to effects. The computational advantage of such models is that they encode not only information about direct dependencies, but also additional structural information about further dependencies (see Pearl, 1988; Spirtes et al., 1993). For example, a common cause model with two effects (Fig. 1A) conveys the information that the two effects are marginally correlated but become independent conditional upon their common cause. A common effect model (Fig. 1B), by contrast, implies that the two alternative causes are marginally independent of each other, but become dependent conditional upon their common effect. Finally, a causal chain model (Fig. 1C) entails that the initial cause becomes independent of the final effect once the intermediate cause is held fixed. These are just some examples of the many useful implications of these models. Whenever these models describe the learning domain appropriately, they have the potential to greatly reduce the learning information required. Estimating Causal Power (Step 2) Following Cheng (1997), the strength of a direct causal relation, the causal power of the cause, can be defined as the probability of the effect in the presence of the cause in the absence of all alternative influences. Causal power is assessed in the cause-effect direction regardless of the order of learning events. At this point, the model is restricted to situations in which information about frequencies is available (e.g., trial-by-trial learning). We assume that learners use frequency information, which is updated after each learning trial, to assess causal power. However, not all the unconditional and conditional frequencies have to be encoded, but only those frequencies which, according to the initial causal model, are relevant to the estimation process. Assuming a situation in which all the causal factors are specified within the causal model ("closed world assumption"), causal power can be directly measured on the basis of observed conditional frequencies. In the simple case of one cause and one effect, the causal power of the cause is represented by the conditional probability of the effect e given cause c, P(e|c). This estimate is already guided by a prior causal model that specifies which of the two events is the cause and which the effect.[1] Figure 1: A common cause (A), common effect (B), and causal chain (C) model. The role of causal models is even clearer in more complex situations with three events (see Eells, 1991; Waldmann & Hagmayer, submitted). In the common cause situation (Fig. 1A), the causal power relation between the cause c and each of the effects e1 and e2 can similarly be inferred on the basis of the conditional probabilities P(e1|c) and P(e2|c), because the model implies conditional independence of the two effects. The situation is different when the causal arrows are reversed, yielding a common effect model with two alternative causes c1 and c2 (Fig. 1B). In this situation, the probability of the effect in the presence of either cause is also influenced by the possible presence of the alternative cause. Thus, in a situation in which the two causes increase the probability of the effect (generative causes), the appropriate method of measuring causal power is to focus on situations in which the alternative causes are absent (for a discussion of preventive causes, see Cheng, 1997). For example, the causal power of c1 can be inferred on the basis of P(e|c1 .~c2). (An isolated period means "and," and "~" refers to the absence of the cause.) Finally, in causal chains the causal power of the initial cause c over its direct effect e1 should be independent of the final effect e2. Therefore, the simple conditional probability P(e1|c) should serve as an indicator of the causal power of event c. The causal power of the intermediate cause e1 is dependent on the kind of causal chain underlying the domain. In a genuine MarkoffMarkov chain, in which the initial cause is independent of the final effect conditional upon the intermediate cause, the conditional probability P(e2|e1) is an appropriate indicator of causal power. However, more complex chains are possible in which the events c and e1 interact (Eells, 1991). In these situations it would be appropriate to control for the influence of the initial cause c by looking at P(e2|e1.c) and possibly also P(e2|e1.~c). (See also Waldmann & Hagmayer, submitted.) With genuine MarkoffMarkov chains both methods should lead to the same results. In summary, the model estimates causal power on the basis of the relevant frequency information. These estimates are updated after each learning trial. Integrating Causal Power Estimates (Step 3) Predictive Learning Common effect models are typical causal models
underlying predictive learning with multiple cues (see Fig 1B). An important
assumption implicitly encoded by these models is that the alternative
causes occur independently of each other, and that their individual
causal impacts on the effect are also independent. Thus, a noisy-or
integration schema provides a natural integration strategy for multiple
causes (see also Pearl, 1988, chap. 4.3.2). Assuming two causes, a noisy-or
schema predicts that the effect is caused either by cause1
or by cause2. Since these two causes may overlap it
is necessary to subtract the intersection. Based on the two power estimates
p1 and p2 for the two causes
c1 and c2, the conditional
probability of the effect can be computed using the noisy-or schema, |
||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Diagnostic Learning A typical causal model underlying diagnostic learning
with multiple effect cues is the common cause model (see Fig 1A). This
model assumes that the effects are independent of each other conditional
upon the states of the cause, thus simplifying the diagnostic judgments.
Instead of having to store the probability of the cause conditional
upon all patterns of effect cues, the model makes it possible to use
the individual power estimates and integrate them by taking their product.
For example, in a common cause situation with two effects e1
and e2, the Bayesian common
cause integration schema can be expressed as |
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Revising the Causal Model (Step 4) |
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
In general we claim that people attempt to make small modifications to the initial causal model. The initial schema for multiple causes will be the noisy-or schema, followed by the And schema. Other modifications of the causal structure are also possible (e.g., adding causal links). It is important to note that, despite the top-down direction of the model, it is implicitly sensitive to violations of model assumptions. The initial model will generate prediction or diagnosis errors when it is inconsistent with the learning data. This will in turn lead to a (parsimonious) modification of the initial model. Empirical Evidence Estimating Causal Power Asymmetries of Cue Competition The model anticipates this asymmetry, because causal power estimates are computed in the cause-effect direction on the basis of assumptions about the underlying causal model. In the predictive condition the cues represent multiple causes. The model predicts that for common effect models it is necessary to calculate causal power estimates for individual causes in the absence of alternative causes. Since in Phase 2 of the blocking design the new redundant cause is never presented in the absence of the cause established within Phase 1, no causal power estimate can be obtained for this redundant cause. Thus we can expect participants to be uncertain about the causal impact of this cue, and express this uncertainty in lowered ratings. By contrast, in the diagnostic condition a common cause model is assumed; the causal power of each effect can be assessed without having to hold constant collateral effects. Thus, both effect cues should yield similar ratings (i.e., absence of blocking). Asymmetries of Base Rate Use Waldmann and Reips (in preparation) have tested this assumption. In a number of experiments, participants learned about identical causal structures with varying causal base rates in either the cause-effect or the effect-cause direction. Subsequent to the learning phase all participants had to give diagnostic judgments. In line with the models predictions, participants used base rate information when prior learning was diagnostic but tended to ignore base rates when it was predictive (see also Waldmann, 1996). Linearly Separable Versus Nonlinearly Separable Category Structures Two category structures were compared. In the linearly separable arrangement (LS), high values are more typical for category A and low values for category B. Within category A, at least two out of three dimensions had high values. By contrast, in the nonlinearly separable arrangement (NLS), neither high nor low values were typical for the two categories A and B. This structure can only be categorized on the basis of a configural cue. Within category A the first and the third dimension are positively correlated (HH or LL), whereas within category B they are negatively correlated (HL or LH). This structure corresponds to an XOR structure with an additional irrelevant feature (Dimension 2).
|
|||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||
|
In the experiments, the factor category structure (LS vs. NLS) was crossed with a second factor in which the causal interpretation of the cues was manipulated by means of initial instructions. In the predictive learning condition, participants were told that some of the three compounds were potential causes: They emit either high (H) or low (L) intensity magnetic waves which may cause some stones to become magnetic. The task was to decide whether the stones in the pictures were magnets (category A) or were not (category B). Thus in this condition a common effect model was instructed. In the diagnostic learning condition a common cause structure was suggested to the participants; they were instructed that some of the stones potentially emit magnetic waves which may affect the orientation of the compounds. The orientation may indicate either a strong (H) or a weak (L) effect. In both conditions the orientation of the compounds served as cues, the only difference was whether the cues were interpreted as causes or as effects. The experiments yielded a number of findings that can be explained by our model (see Waldmann et al., 1995). One general finding was that in the predictive learning condition the LS category structure was easier to learn than the NLS structure. This finding is in line with the assumption inherent in the model that learners sequentially activate integration schemas that are ordered on the basis of complexity. As the model starts with a noisy-or schema, it fails initially with both category structures. However, the next schema (And schema) includes additional terms for paired ed cues. This schema picks up the two-out-of-three rule embodied in the LS structure but is unable to capture the more complex XOR interaction in the NLS structure. The findings in diagnostic learning conditions are more complex. In Experiment 5 (Waldmann et al., 1995), participants learned that the exemplars in category A were caused by the presence of a magnet, whereas the stones in category B were not magnetic. This instruction yielded a clear learning advantage for the LS structure. Far less errors were committed when participants learned the LS structure than when they learned the NLS structure. However, when the instructions were slightly modified the opposite effect was observed. In Experiment 4 participants were told that there are two types of magnets, strong and weak. As in Experiment 5, participants only had to decide whether there was a magnet (category A) or not (category B). No feedback was given about the strength of the magnet. Thus, apart from the instructional difference, the procedure was identical in the two experiments. Nevertheless, the NLS structure proved easier to learn than the LS structure in Experiment 4, in which the variability of the strength of the magnets was pointed out. How can this reversal be explained by the model? For Experiment 5, the model will first sets up a common cause model, which is based on the initial instructions (Step 1) and specifies how causal power is assessed (Step 2). On the basis of frequency input, updated after each learning trial, the causal power between the presence and absence of the cause (category A vs. B) and each of the three effects will be estimated by calculating the conditional probability of the states of the effects (H vs. L), given the two categories. For example, P(e1=H|c) expresses the probability of the first dimension having a high intensity value in the presence of a magnet. In the LS condition, the model will eventually learn that the probability of each effect having a high value is 0.75 within category A and 0.25 within category B. The probabilities of a low value are the complements. By contrast, in the NLS condition these estimates will be 0.5 for both categories. To obtain categorization judgments the power estimates will be plugged into a common cause schema (see Equation 2) for three effects. Using this schema, the probability of the presence (category A) or absence (category B) of the cause will be compared. For example, given an HHH pattern (case 1) the probability of category A is the product of the three power estimates of the three effects (0.753 ) multiplied by the base rate (0.5) and the normalizing constant (identical for both categories). The fact that the probability of category A is higher than that of category B will lead to the correct decision that this case belongs to category A. Applying this schema to the other learning exemplars also leads to correct categorizations. By contrast, applying this procedure to the NLS structure will not be successful. Given that each effect is equally associated with both categories, no reliable categorization can be achieved. The only solution is to modify the initial model (Step 4), which will take time relative to the LS condition. To model the results of Experiment 4, the additional assumption has to be made that participants enter the task with prior knowledge that strong magnets tend to produce high intensity values whereas weak magnets are more likely to cause low values. The model again approaches the task using a common cause model (Step 1). However, based on the instructions, the model has to express the fact that the cause (category A) can be strong or weak. It is therefore necessary to obtain causal power estimates for three causal events, the cause being strong, the cause being weak (e.g., P(e1=H|c=weak), and absence of the cause (category B). Since no feedback is given about the strength of the cause, the participants have to infer the probable state of the cause by themselves. This can be achieved on the basis of prior assumptions about a positive correlation between the state of the cause and the state of the effects, which can be implemented by having the learning process start with a preset data base that embodies these correlations. These assumptions will, for example, lead to the decision that the HHH case in the NLS structure is probably caused by a strong cause. The feedback confirms that this case indeed belongs to category A. Therefore, the causal power estimate for the strong value of the cause will be updated. Similarly, an LLL case will lead to an updating of the weak value of the cause. Due to the outlying value of the middle dimension, the other two cases within category A (e.g., HLH) will initially lead to incorrect category B decisions (the model does not know yet that Dimension 2 is irrelevant). However, the learning feedback reassigns these two cases to category A. Now a decision has to be made between a strong and a weak cause, which in the HLH case leads to an update of the power estimate for the strong cause, and in the LHL case to an update for the weak cause. Eventually the model will learn that the probability of high intensity values of the relevant effects (Dimensions 1, 3) is 1 when the cause is strong and 0 when it is weak, or vice versa when the cause is weak. Dimension 2 will lead to estimates of 0.5 with either state of the cause. Furthermore, the probabilities of the values of all three effects are uniformly 0.5 within category B. Using these power estimates the model is able to correctly classify the eight cases of the NLS arrangement. The model classifies a case into category A when either a strong or a weak cause is inferred; otherwise the case will be assigned to category B. With the power estimates generated in Step 2 (and 0.25 as the base rate estimates for the two states of the cause) the probability of category B will always be lower than that of category A for cases 1 to 4. By contrast, cases 5 to 8 will be correctly assigned to category B. Using the initial assumptions outlined above, the model will make more
errors with the LS structure than with the NLS one. Again, the model
will initially assign the LLL case to category A (weak state), although
this is the wrong decision in this condition. Except for the correctly
classified HHH case, the other cases within category A will create problems.
They will be wrongly assigned to category B. After feedback they will
be reassigned. However, since these exemplars have more H than L values,
only the power estimate for the strong variant of the cause will be
updated. Eventually this will lead to a fading out of the hypothesis
that the cause might also be weak, because the constant updating of
only one value of the cause will boost the base rate estimate for this
value at the expense of the alternative value. At the asymptote the
model will have learned that there is no weak cause, but this will take
time. |
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Cosmides, L., & Tooby, J. (1996). Are humans good statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition, 58, 1-73. Eells, E. (1991). Probabilistic causality. Cambridge: Cambridge University Press. Koslowski, B. (1996). Theory and evidence. Cambridge, MA: The MIT Press. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann Publishers. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II. Current research and theory. New York: Appleton-Century-Crofts. Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgment. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory (Vol. 21). New York: Academic Press. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York: Springer-Verlag. Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. J. Holyoak & D. L. Medin (Eds.), The psychology of learning and motivation, Vol. 34: Causal learning. San Diego: Academic Press. Waldmann, M. R., & Hagmayer, Y. (submitted). Estimating causal strength: The role of structural knowledge and processing effort. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222-236. Waldmann, M. R., Holyoak, K. J., & Fratianne, A.. (1995). Causal models and the acquisition of category structure. Journal of Experimental Psychology: General, 124, 181-206. Waldmann, M. R., & Reips, U.-D. (in preparation). Base rate appreciation after predictive and diagnostic learning.
|
|||||||||||||||||||||||||||||||||||||
| Contact Author |
Laura Martignon |
This paper is an electronic archival
version of a published print book chapter.
Please cite according to the published version. |
|||||||||||||||||||||||||||||||||||
| » Home » The Institute » Electronic Full Texts | |||||||||||
| Update 6/2001 | » webmaster-library(at)mpib-berlin.mpg.de » ©Copyright |
||||||||||