Designing Social Inquiry - Part 4
Library

Part 4

3.2 CLARIFYING ALTERNATIVE DEFINITIONS OF CAUSALITY.

In section 3.1, we defined causality in terms of a causal effect: the mean causal effect is the difference between the systematic component of a dependent variable when the causal variable takes on two different values. In this section, we use our definition of causality to clarify several alternative proposals and apparently complicating ideas. We show that the important points made by other authors about "causal mechanisms" (section 3.2.1), "multiple" causality (section 3.2.2), and "symmetric" versus "asymmetric" causality (section 3.2.3) do not conflict with our more basic definition of causality.

3.2.1 "Causal Mechanisms"

Some scholars argue that the central idea of causality is that of a set of "causal mechanisms" posited to exist between cause and effect (see Little 1991:15). This view makes intuitive sense: any coherent account of causality needs to specify how the effects are exerted. For example, suppose a researcher is interested in the effect of a new bilateral tax treaty on reducing the United States's current account deficit with j.a.pan. According to our definition of causality, the causal effect here is the reduction in the expected current account deficit with the tax treaty in effect as compared to the same situation (at the same time and for the same countries) with the exception that the treaty was not in effect. The causal mechanism operating here would include, in turn, the signing and ratification of the tax treaty, newspaper reports of the event, meetings of the relevant actors within major multinational companies, compensatory actions to reduce their total international tax burden (such as changing its transfer pricing rules or moving manufacturing plants between countries), further actions by other companies and workers to take advantage of the movements of capital and labor between countries, and so on, until we reach the final effect on the balance of payments between the United States and j.a.pan.

From the standpoint of processes through which causality operates, an emphasis on causal mechanisms makes intuitive sense: any coherent account of causality needs to specify how its effects are exerted. Identifying causal mechanisms is a popular way of doing empirical a.n.a.lyses. It has been called, in slightly different forms, "process tracing" (which we discuss in section 6.3.3), "historical a.n.a.lysis," and "detailed case studies." Many of the details of well-done case studies involve identifying these causal mechanisms.

However, identifying the causal mechanisms requires causal inference, using the methods discussed below. That is, to demonstrate the causal status of each potential linkage in such a posited mechanism, the investigator would have to define and then estimate the causal effect underlying it. To portray an internally consistent causal mechanism requires using our more fundamental definition of causality offered in section 3.1 for each link in the chain of causal events.

Hence our definition of causality is logically prior to the identification of causal mechanisms. Furthermore, there always exists in the social sciences an infinity of causal steps between any two links in the chain of causal mechanisms. If we posit that an explanatory variable causes a dependent variable, a "causal mechanisms" approach would require us to identify a list of causal links between the two variables. This definition would also require us to identify a series of causal linkages, to define causality for each pair of consecutive variables in the sequence, and to identify the linkages between any two of these variables and the connections between each pair of variables. This approach quickly leads to infinite regress, and at no time does it alone give a precise definition of causality for any one cause and one effect.

In our example of the effect of a presidential versus parliamentary system on democratic stability (section 3.1.2), the hypothesized causal mechanisms include greater minority disaffection under a presidential regime and lesser governmental decisiveness under a parliamentary regime. These intervening effects-caused by the const.i.tutional system and, in turn, affecting political stability-can be directly observed. We could monitor the att.i.tudes or behaviors of minorities to see how they differ under the two experimental conditions or study the decisiveness of the governments under each system. Yet even if the causal effect of presidential versus parliamentary systems could operate in different ways, our definition of the causal effect would remain valid. We can define a causal effect without understanding all the causal mechanisms involved, but we cannot identify causal mechanisms without defining the concept of causal effect.

In our view, identifying the mechanisms by which a cause has its effect often builds support for a theory and is a very useful operational procedure. Identifying causal mechanisms can sometimes give us more leverage over a theory by making observations at a different level of a.n.a.lysis into implications of the theory. The concept can also create new causal hypotheses to investigate. However, we should not confuse a definition of causality with the nondefinitional, albeit often useful, operational procedure of identifying causal mechanisms.

3.2.2 "Multiple Causality"

Charles Ragin, in a recent work (1987:34-52), argues for a methodology with many explanatory variables and few observations in order that one can take into account what he calls "multiple causation." That is, "The phenomenon under investigation has alternative determinants-what Mill (1843) referred to as the problem of 'plurality of causes.' " This is the problem referred to as "equifinality" in general systems theory (George 1982:11). In situations of multiple causation, these authors argue that the same outcome can be caused by combinations of different independent variables.37 Under conditions in which different explanatory variables can account for the same outcome on a dependent variable, according to Ragin, some statistical methods will falsely reject the hypothesis that these variables have causal status. Ragin is correct that some statistical models (or relevant qualitative research designs) could fail to alert an investigator to the existence of "multiple causality," but appropriate statistical models can easily handle situations like these (some of which Ragin discusses).

Moreover, the fundamental features of "multiple causality" are compatible with our definition of causality. They are also no different for quant.i.tative than qualitative research. The idea contains no new features or theoretical requirements. For example, consider the hypothesis that a person's level of income depends both on high educational attainment and highly educated parents. Having one but not both is insufficient. In this case, we need to compare categories of our causal variable: respondents who have high educational attainment and highly educated parents, the two groups who have one but not the other, and the group with neither. Thus, the concept of "multiple causation" puts greater demands on our data since we now have four categoriesof our causal variables, but it does not require a modification of our definition of causality. For our definition, we would need to measure the expected income for the same person, at the same time, experiencing each of the four conditions.

But what happens if different causal explanations generate the same values of the dependent variable? For example, suppose we consider whether or not one graduated from college as our (dichotomous) causal variable in a population of factory workers. In this situation, both groups could quite reasonably earn the same income (our dependent variable). One reason might be that this explanatory variable (college attendance) has no causal effect on income among factory workers, perhaps because a college education does not help one perform better. Alternatively, different explanations might lead to the same level of income for those educated and those not educated. College graduates might earn a particular level of income because of their education, whereas those who had no college education might earn the same level of income because of their four years of additional seniority on the job. In this situation wouldn't we be led to conclude that "college education" has no causal effect on income levels for those who will become factory workers?

Fortunately, our definition of causality requires that we more carefully specify the counterfactual condition. In the present example, the values of the key causal variable to be varied are (1) college education, as compared to (2) no college education but four additional years of job seniority. The dependent variable is starting annual income. Our causal effect is then defined as follows: we record the income of a person graduating from college who goes to work in a factory. Then, we go back in time four years, put this same person to work in the same factory instead of in college and, at the end of four years, measure his or her income "again." The expected difference between these two levels of income for this one individual is our definition of the mean causal effect. In the present situation, we have imagined that this causal effect is zero. But this does not mean that "college education has no effect on income," only that the average difference between treatment groups (1) and (2) is zero. In fact, there is no logically unique definition of "the causal effect of college education" since one cannot define a causal effect without at least two conditions. The conditions need not be the two listed here, but they must be very clearly identified.

An alternative pair of causal conditions is to compare a college graduate with someone without a college degree but with the same level of job seniority as the college graduate. In one sense, this is unrealistic, since the non-college graduate would have to do something for the four years while not attending college, but perhaps we would be willing to imagine that this person had a different, irrelevant job for those four years. Put differently, this alternative counterfactual is the effect of a college education compared to that of none, with job seniority held constant. Failure to hold seniority constant in the two causal conditions would cause any research design to yield estimates of our first counterfactual instead of this revised one. If the latter were the goal, but no controls were introduced, our empirical a.n.a.lysis would be flawed due to "omitted variable bias" (which we introduce in section 5.2).

Thus, the issues addressed under the label "multiple causation" do not confound our definition of causality although they may make greater demands in our subsequent a.n.a.lyses. The fact that some dependent variables, and perhaps all interesting social science-dependent variables, are influenced by many causal factors does not make our definition of causality problematic. The key to understanding these very common situations is to define the counterfactual conditions making up each causal effect very precisely. We demonstrate in chapter 5 that researchers need not identify "all" causal effects on a dependent variable to provide estimates of the one causal effect of interest (even if that were possible). A researcher can focus on only the one effect of interest, establish firm conclusions, and then move on to others that may be of interest (see sections 5.2 and 5.3).38

3.2.3 "Symmetric" and "Asymmetric" Causality.

Stanley Lieberson (1985:63-64) distinguishes between what he refers to as "symmetrical" and "asymmetrical" forms of causality. He is interested in causal effects which differ when an explanatory variable is increased as compared to when it is decreased. In his words,In examining the causal influence of X1 [an explanatory variable] on Y [a dependent variable], for example, one has also to consider whether shifts to a given value of X1 from either direction have the same consequences for Y.... If the causal relationship between X1 [an explanatory variable] and Y [a dependent variable] is symmetrical or truly reversible, then the effect on Y of an increase in X1 will disappear if X1 shifts back to its earlier level (a.s.suming that all other conditions are constant).

As an example of Lieberson's point, imagine that the Fourth Congressional District in New York had no inc.u.mbent in 1998 and that the Democratic candidate received 55 percent of the vote. Lieberson would define the causal effect of inc.u.mbency as the increase in the vote if the winning Democrat in 1998 runs as an inc.u.mbent in the next election in the year 2000. This effect would be "symmetric" if the absence of an inc.u.mbent in the subsequent election (in year 2002) caused the vote to return to 55 percent. The effect might be "asymmetric" if, for example, the inc.u.mbent Democrat raised money and improved the Democratic party's campaign organization; as a result, if no inc.u.mbent were running in 2002, the Democratic candidate might receive more than 55 percent of the vote.

Lieberson's argument is clever and very important. However, in our view, his argument does not const.i.tute a definition of causality, but applies only to some causal inferences-the process of learning about a causal effect from existing observations. In section 3.1, we defined causality for a single unit. In the present example, a causal effect can be defined theoretically on the basis of hypothetical events occurring only in the 1998 election in the Fourth District in New York. Our definition is the difference in the systematic component of the vote in this district with an inc.u.mbent in this election and without an inc.u.mbent in the same election, time, and district.

In contrast, Lieberson's example involves no hypothetical quant.i.ties and therefore cannot be a causal definition. This example involves only what would actually occur if the explanatory variable changed in two real elections from noninc.u.mbent to inc.u.mbent, versus inc.u.mbent to noninc.u.mbent in two other elections. Any empirical a.n.a.lysis of this example would involve numerous problems of inference. We discuss many of these problems of causal inference in chapters 4-6. In the present example, we might ask whether the estimated effect seemed larger only because we failed to account for a large number of recently registered citizens in the Fourth District. Or, did the surge in support for the Democrat in the election in which she or he was an inc.u.mbent seem smaller than it should because we necessarily discarded districts where the Democrat lost the first election?

Thus, Lieberson's concepts of "symmetrical" and "asymmetrical" causality are important to consider in the context of causal inference. However, they should not be confused with a theoretical definition of causality, which we give in section 3.1.

3.3 a.s.sUMPTIONS REQUIRED FOR ESTIMATING CAUSAL EFFECTS.

How do we avoid the Fundamental Problem of Causal Inference and also the problem of separating systematic from nonsystematic components? The full answer to this question will consume chapters 4-6, but we provide an overview here of what is required in terms of the two possible a.s.sumptions that enable us to get around the fundamental problem. These are unit h.o.m.ogeneity (which we discuss in section 3.3.1) and conditional independence (section 3.3.2). These a.s.sumptions, like any other attempt to circ.u.mvent the Fundamental Problem of Causal Inference, always involve some untestable a.s.sumptions. It is the responsibility of all researchers to make the substantive implications of this weak spot in their research designs extremely clear and visible to readers. Causal inferences should not appear like magic. The a.s.sumptions can and should be justified with whatever side information or prior research can be mustered, but it always must be explicitly recognized.

3.3.1 Unit h.o.m.ogeneity.

If we cannot rerun history at the same time and the same place with different values of our explanatory variable each time-as a true solution to the Fundamental Problem of Causal Inference would require-we can attempt to make a second-best a.s.sumption: we can rerun our experiment in two different units that are "h.o.m.ogeneous." Two units are h.o.m.ogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variable takes on a particular value. (That is,andFor example, if we observe X = 1 (an inc.u.mbent) in district 1 and X = 0 (no inc.u.mbent) in district 2, an a.s.sumption of unit h.o.m.ogeneity means that we can use the observed proportions of the vote in two separate districts for inference about the causal effect , which we a.s.sume is the same in both districts. For a data set with n observations, unit h.o.m.ogeneity is the a.s.sumption that all units with the same value of the explanatory variables have the same expected value of the dependent variable. Of course, this is only an a.s.sumption and it can be wrong: the two districts might differ in some unknown way that would bias our causal inference. Indeed, any two real districts will differ in some ways; application of this a.s.sumption requires that these districts must be the same on average over many hypothetical replications of the election campaign. For example, patterns of rain (which might inhibit voter turnout in some areas) would not differ across districts on average unless there were systematic climatic differences between the two areas.

In the following quotation, Holland (1986:947) provides a clear example of the unit h.o.m.ogeneity a.s.sumption (defined from his perspective of a realized causal effect instead of the mean causal effect). Since very little randomness exists in the experiment in the following example, his definition and ours are close. (Indeed, as we show in section 4.2, with a small number of units, the a.s.sumption of unit h.o.m.ogeneity is most useful when the amount of randomness is fairly low.) If [the unit] is a room in a house, t [for 'treatment'] means that I flick the light switch in that room, c [for 'control'] means that I do not, and [the dependent variable] indicates whether the light is on or not a short time after applying either t or c, then I might be inclined to believe that I can know the values of [the dependent variable for both t and c] by simply flicking the switch. It is clear, however, that it is only because of the plausibility of certain a.s.sumptions about the situation that this belief of mine can be shared by anyone else. If, for example, the light has been flicking off and on for no apparent reason while I am contemplating beginning this experiment, I might doubt that I would know the values of [the dependent variable for both t and c] after flicking the switch-at least until I was clever enough to figure out a new experiment!

In this example, the unit h.o.m.ogeneity a.s.sumption is that if we had flicked the switch (in Holland's notation, applied t) in both periods, the expected value (of whether the light will be on) would be the same. Unit h.o.m.ogeneity also a.s.sumes that if we had not flicked the switch (applied c) in both periods, the expected value would be the same, although not necessarily the same as when t is applied. Note that we would have to reset the switch to the off position after the first experiment to a.s.sure this, but we would also have to make the untestable a.s.sumption that flipping the switch on in the first period does not effect the two hypothetical expected values in the next period (such as if a fuse were blown after the first flip). In general, the unit h.o.m.ogeneity a.s.sumption is untestable for a single unit (although, in this case, we might be able to generate several new hypotheses about the causal mechanism by ripping the wall apart and inspecting the wiring).

A weaker, but also fully acceptable, version of unit h.o.m.ogeneity is the constant effect a.s.sumption. Instead of a.s.suming that the expected value of the dependent variable is the same for different units with the same value of the explanatory variable, we need only to a.s.sume that the causal effect is constant. This is a weaker version of the unit h.o.m.ogeneity a.s.sumption, since the causal effect is only the difference between the two expected values. If the two expected values for units with the same value of the explanatory variable vary in the same way, the unit h.o.m.ogeneity a.s.sumption would be violated, but the constant effect a.s.sumption would still be valid. For example, two congressional districts could vary in the expected proportion of the vote for Democratic noninc.u.mbents (say 45 percent vs. 65 percent), but inc.u.mbency could still add an additional ten percent to the vote of a Democratic candidate of either district.

The notion of unit h.o.m.ogeneity (or the less demanding a.s.sumption of constant causal effects) lies at the base of all scientific research. It is, for instance, the a.s.sumption underlying the method of comparative case studies. We compare several units that have varying values on our explanatory variables and observe the values of the dependent variables. We believe that the differences we observe in the values of the dependent variables are the result of the differences in the values of the explanatory variables that apply to the observations. What we have shown here is that our "belief" in this case necessarily relies upon an a.s.sumption of unit h.o.m.ogeneity or constant effects.

Note that we may seek h.o.m.ogeneous units across time or across s.p.a.ce. We can compare the vote for the Democratic candidate when there is a Democratic inc.u.mbent running with the vote when there is no Democratic inc.u.mbent in the same district at different times or across different districts at the same time (or some combination of the two). Since a causal effect can only be estimated instead of known, we should not be surprised that the unit h.o.m.ogeneity a.s.sumption is generally untestable. But it is important that the nature of the a.s.sumption is made explicit. Across what range of units do we expect our a.s.sumption of a uniform inc.u.mbency effect to hold? All races for Congress? Congressional but not Senate races? Races in the North only? Races in the past two decades only?

Notice how the unit h.o.m.ogeneity a.s.sumption relates to our discussion in section 1.1.3 on complexity and "uniqueness." There we argued that social science generalization depends on our ability to simplify reality coherently. At the limit, simplifying reality for the purpose of making causal inferences implies meeting the standards for unit h.o.m.ogeneity: the observations being a.n.a.lyzed become, for the purposes of a.n.a.lysis, identical in relevant respects. Attaining unit h.o.m.ogeneity is often impossible; congressional elections, not to speak of revolutions, are hardly close a.n.a.logies to light switches. But understanding the degree of heterogeneity in our units of a.n.a.lysis will help us to estimate the degree of uncertainty or likely biases to be attributed to our inferences.

3.3.2 Conditional Independence.

Conditional independence is the a.s.sumption that values are a.s.signed to explanatory variables independently of the values taken by the dependent variables. (The term is sometimes used in statistics, but it does not have the same definition as it commonly does in probability theory.) That is, after taking into account the explanatory variables (or controlling for them), the process of a.s.signing values to the explanatory variable is independent of both (or, in general two or more) dependent variables,and. We use the term "a.s.signing values" to the explanatory variables to describe the process by which these variables obtain the particular values they have. In experimental work, the researcher actually a.s.signs values to the explanatory variables; some subjects are a.s.signed to the treatment group and others to the control group. In nonexperimental work, the values that explanatory variables take may be "a.s.signed" by nature or the environment. What is crucial in these cases is that the values of the explanatory variables are not caused by the dependent variables. The problem of "endogeneity" that exists when the explanatory variables are caused, at least in part, by the dependent variables is described in section 5.4.

Large-n a.n.a.lyses that involve the procedures of random selection and a.s.signment const.i.tute the most reliable way to a.s.sure conditional independence and do not require the unit h.o.m.ogeneity a.s.sumption. Random selection and a.s.signment help us to make causal inferences because they automatically satisfy three a.s.sumptions that underlie the concept of conditional independence: (1) that the process of a.s.signing values to the explanatory variables is independent of the dependent variables (that is, there is no endogeneity problem); (2) that selection bias, which we discuss in section 4.3, is absent; and (3) that omitted variable bias (section 5.2) is also absent. Thus, if we are able to meet these conditions in any way, either through random selection and a.s.signment (as discussed in section 4.2) or through some other procedure, we can avoid the Fundamental Problem of Causal Inference.

Fortunately, random selection and a.s.signment are not required to meet the conditional independence a.s.sumption. If the process by which the values of the explanatory variables are "a.s.signed" is not independent of the dependent variables, we can still meet the conditional independence a.s.sumption if we learn about this process and include a measure of it among our control variables. For example, suppose we are interested in estimating the effect of the degree of residential segregation on the extent of conflict between Israelis and Palestinians in communities on the Israeli-occupied West Bank. Our conditional independence a.s.sumption would be severely violated if we looked only at the a.s.sociation between these two variables to find the causal effect. The reason is that the Israelis and Palestinians who choose to live in segregated neighborhoods may do so out of an ideological belief about who ultimately has rights to the West Bank. Ideological extremism (on both sides) may therefore lead to conflict. A measure that we believe to be residential segregation might really be a surrogate for ideology. The difference between the two explanations may be quite important, since a new housing policy might help remedy the conflict if residential segregation were the real cause, whereas this policy would be ineffective or even counterproductive if ideology were really the driving force. We might correct for the problem here by also measuring the ideology of the residents explicitly and controlling for it. For example, we could learn how popular extremist political parties are among the Israelis and PLO affiliation is among the Palestinians. We could then control for the possibly confounding effects of ideology by comparing communities with the same level of ideological extremism but differing levels of residential segregation.

When random selection and a.s.signment are infeasible and we cannot control for the process of a.s.signment and selection, we have to resort to some version of the unit h.o.m.ogeneity a.s.sumption in order to make valid causal inferences. Since that a.s.sumption will be only imperfectly met in social science research, we will have to be especially careful to specify our degree of uncertainty about causal inferences. This a.s.sumption will be particularly apparent when we discuss the procedures used in "matching" observations in section 5.6.

Notation for a Formal Model of a Causal Effect. We now generalize our notation for the convenience of later sections. In general, we will have n realizations of a random variable Yi. In our running quant.i.tative example, n is the number of congressional districts (435), and the realization y of the random variable Yi is the observed Democratic proportion of the two-party vote in district i (such as 0.56). The expected noninc.u.mbent Democratic proportion of the two-party vote (the average over all hypothetical replications) in district i is. We define the explanatory variable as Xi, which is coded in the present example as zero when district i has no Democratic inc.u.mbent and as one when district i has a Democratic inc.u.mbent. Then, we can denote the mean causal effect in unit i as (3.4).

and incorporate it into the following simple formal model:.

(3.5).

Thus, when district i has no inc.u.mbent, and Xi = 0, the expected value is determined by subst.i.tuting zero into equation (3.5) for Xi, and the answer is as before: Similarly, when a Democratic inc.u.mbent is running in district i, the expected value is: Thus, equation (3.5) provides a useful model of causal inference, and -the difference between the two theoretical proportions-is our causal effect. Finally, for future reference, we simplify equation (3.5) one last time. If we a.s.sume that Yi has a zero mean (or is written as a deviation from its mean, which does not limit the applicability of the model in any way), then we can drop the intercept from this equation, and write it more simply as (3.6).

The parameter is still the theoretical value of the mean causal effect, a systematic feature of the random variables, and one of our goals in causal inference. This model is a special case of "regression a.n.a.lysis," which is common in quant.i.tative research, but regression coefficients are only sometimes coincident with estimates of causal effects.

3.4 CRITERIA FOR JUDGING CAUSAL INFERENCES.

Recall that by defining causality in terms of random variables, we were able to draw a strict a.n.a.logy between it and other systematic features of phenomena, such as a mean or a variance, on which we focus in making descriptive inferences. This a.n.a.logy enables us to use precisely the same criteria to judge causal inferences as we used to judge descriptive inferences in section 2.7: unbiasedness and efficiency. Hence, most of what we said on this subject in Chapter 2 applies equally well to the causal inference problems we deal with here. In this section, we briefly formalize the relatively few differences between these two situations.

In section 2.7 the object of our inference was a mean (the expected value of a random variable), which we designate as . We conceptualize as a fixed, but unknown, number. An estimator of is said to be unbiased if it equals on average over many hypothetical replications of the same experiment.

As above, we continue to conceptualize the expected value of a random causal effect, denoted as , as a fixed, but unknown, number. The unbiasedness is then defined a.n.a.logously: an estimator of is unbiased if it equals on average over many hypothetical replications of the same experiment. Efficiency is also defined a.n.a.logously as the variability across these hypothetical replications. These are very important concepts that will serve as the basis for our studies of many of the problems of causal inference in chapters 4-6. The two boxes that follow provide formal definitions.

A Formal a.n.a.lysis of Unbiasedness of Causal Estimates. In this box, we demonstrate the unbiasedness of the estimator of the causal effect parameter from section 3.1. The notation and logic of these ideas closely parallel those from the formal definition of unbiasedness in the context of descriptive inference in section 2.7. The simple linear model with one explanatory and one dependent variable is as follows:39 Our estimate of is simply the least squares regression estimate: (3.7).

To determine whether b is an unbiased estimator of , we need to take the expected value, averaging over hypothetical replications: (3.8).

which proves that b is an unbiased estimator of .

A Formal a.n.a.lysis of Efficiency. Here, we a.s.sess the efficiency of the standard estimator of the causal effect parameter from section 3.1. We proved in equation (3.8) that this estimator is unbiased and now calculate its variance: (3.9).

Thus, the variance of this estimator is a function of two components. First, the more random each unit in our data (the larger is 2) is, the more variable will be our estimator b; this should be no surprise. In addition, the larger the observed variance in the explanatory variable, the less variable will be our estimate of b. In the extreme case of no variability in X, nothing can help us estimate the effect of changes in the explanatory variable on the dependent variable, and the formula predicts an infinite variance (complete uncertainty) in this instance. More generally, this component indicates that efficiency is greatest when we have evidence from a larger range of values of the explanatory variable. In general, then, it is best to evaluate our causal hypotheses in as many diverse situations as possible. One way to think of this latter point is to think about drawing a line with a ruler, two dots on a page, and a shaky hand. If the two dots are very close together (small variance of X), errors in the placement of the ruler will be much larger than if the dots are farther apart (the situation of a large variance in X).

3.5 RULES FOR CONSTRUCTING CAUSAL THEORIES.

Much sensible advice about improving qualitative research is precise, specific, and detailed; it involves a manageable and therefore narrow aspect of qualitative research. However, even in the midst of solving a host of individual problems, we must keep the big picture firmly in mind: each specific solution must help in solving whatever is the general causal inference problem one aims to solve. Thus far in this chapter, we have provided a precise theoretical definition of a causal effect and discussed some of the issues involved in making causal inferences. We take a step back now and provide a broader overview of some rules regarding theory construction. As we discuss (and have discussed in section 1.2), improving theory does not end when data collection begins.

Causal theories are designed to show the causes of a phenomenon or set of phenomena. Whether originally conceived as deductive or inductive, any theory includes an interrelated set of causal hypotheses. Each hypothesis specifies a posited relationship between variables that creates observable implications: if the specified explanatory variables take on certain values, other specified values are predicted for the dependent variables. Testing or evaluating any causal hypothesis requires causal inference. The overall theory, of which the hypotheses are parts should be internally consistent, or else hypotheses can be generated that contradict one another.

Theories and hypotheses that fit these definitions have an enormous range. In this section, we provide five rules that will help in formulating good theories, and we provide a discussion of each with examples.

3.5.1 Rule 1: Construct Falsifiable Theories.

By this first rule, we do not only mean that a "theory" incapable of being wrong is not a theory. We also mean that we should design theories so that they can be shown to be wrong as easily and quickly as possible. Obviously, we should not actually try to be wrong, but even an incorrect theory is better than a statement that is neither wrong nor right. The emphasis on falsifiable theories forces us to keep the right perspective on uncertainty and guarantees that we treat theories as tentative and not let them become dogma. We should always be prepared to reject theories in the face of sufficient scientific evidence against them. One question that should be asked about any theory (or of any hypothesis derived from the theory) is simply: what evidence would falsify it? The question should be asked of all theories and hypotheses but, above all, the researcher who poses the theory in the first place should ask it of his or her own.

Karl Popper is most closely identified with the idea of falsifiability (Popper 1968). In Popper's view, a fundamental asymmetry exists between confirming a theory (verification) and disconfirming it (falsification). The former is almost irrelevant, whereas the latter is the key to science. Popper believes that a theory once stated immediately becomes part of the body of accepted scientific knowlege. Since theories are general, and hypotheses specific, theories technically imply an infinite number of hypotheses. However, empirical tests can only be conducted on a finite number of hypotheses. In that sense, "theories are not verifiable" because we can never test all observable implications of a theory (Popper 1968:252). Each hypothesis tested may be shown to be consistent with the theory, but any number of consistent empirical results will not change our opinions since the theory remains accepted scientific knowledge. On the other hand, if even a single hypothesis is shown to be wrong, and thus inconsistent with the theory, the theory is falsified, and it is removed from our collection of scientific knowledge. "The pa.s.sing of tests therefore makes not a jot of difference to the status of any hypothesis, though the failing of just one test may make a great deal of difference" (Miller 1988:22). Popper did not mean falsification to be a deterministic concept. He recognized that any empirical inference is to some extent uncertain (Popper 1982). In his discussion of disconfirmation, he wrote, "even if the asymmetry [between falsification and verification] is admitted, it is still impossible, for various reasons, that any theoretical system should ever be conclusively falsified" (Popper 1968:42).

In our view, Popper's ideas are fundamental for formulating theories. We should always design theories that are vulnerable to falsification. We should also learn from Popper's emphasis on the tentative nature of any theory. However, for evaluating existing social scientific theories, the asymmetry between verification and falsification is not as significant. Either one adds to our scientific knowledge. The question is less whether, in some general sense, a theory is false or not-virtually every interesting social science theory has at least one observable implication that appears wrong-than how much of the world the theory can help us explain. By Popper's rule, theories based on the a.s.sumption of rational choice would have been rejected long ago since they have been falsified in many specific instances. However, social scientists often choose to retain the a.s.sumption, suitably modified, because it provides considerable power in many kinds of research problems (see Cook and Levi 1990). The same point applies to virtually every other social science theory of interest. The process of trying to falsify theories in the social sciences is really one of searching for their bounds of applicability. If some observable implication indicates that the theory does not apply, we learn something; similarly, if the theory works, we learn something too.

For scientists (and especially for social scientists) evaluating properly formulated theories, Popper's fundamental asymmetry seems largely irrelevant. O'Hear (1989:43) made a similar point about the application of Popper's ideas to the physical sciences:Popper always tends to speak in terms of explanations of universal theories. But once again, we have to insist that proposing and testing universal theories is only part of the aim of science. There may be no true universal theories, owing to conditions differing markedly through time and s.p.a.ce; this is a possibility we cannot overlook. But even if this were so, science could still fulfil [sic] many of its aims in giving us knowledge and true predictions about conditions in and around our spatio-temporal niche.

Surely this same point applies even more strongly to the social sciences.

Furthermore, Popper's evaluation of theories does not fundamentally distinguish between a newly formulated theory and one that has withstood numerous empirical tests. When we are testing for the deterministic distinction between the truth or fiction of a universal theory (of which there exists no interesting examples), Popper's view is appropriate, but from our perspective of searching for the bounds of a theory's applicability, his view is less useful. As we have indicated many times in this book, we require all inferences about specific hypotheses to be made by stating a best guess (an estimate) and a measure of the uncertainty of this guess. Whether we discover that the inference is consistent with our theory or inconsistent, our conclusion will have as much effect on our belief in the theory. Both consistency and inconsistency provide information about the truth of the theory and should affect the certainty of our beliefs.40 Consider the hypothesis that Democratic and Republican campaign strategies during American presidential elections have a small net effect on the election outcome. Numerous more specific hypotheses are implied by this one, such as that television commercials, radio commercials, and debates all have little effect on voters. Any test of the theory must really be a test of one of these hypotheses. One test of the theory has shown that forecasts of the outcome can be made very accurately with variables available only at the time of the conventions-and thus before the campaign (Gelman and King 1993). This test is consistent with the theory (if we can predict the election before the campaign, the campaign can hardly be said to have much of an impact), but it does not absolutely verify it. Some aspect of the campaign could have some small effect that accounts for some of the forecasting errors (and few researchers doubt that this is true). Moreover, the prediction could have been luck, or the campaign could have not included any innovative (and hence unpredictable) tactics during the years for which data were collected.

We could conduct numerous other tests by including variables in the forecasting model that measure aspects of the campaign, such as relative amounts of TV and radio time, speaking ability of the candidates, and judgements as to the outcomes of the debates. If all of these hypotheses show no effect, then Popper would say that our opinion is not changed in any interesting way: the theory that presidential campaigns have no effect is still standing. Indeed, if we did a thousand similar tests and all were consistent with the theory, the theory could still be wrong since we have not tried every one of the infinite number of possible variables measuring the campaign. So even with a lot of results consistent with the theory, it still might be true that presidential campaigns influence voter behavior.

However, if a single campaign event-such as substantial accusations of immoral behavior-is shown to have some effect on voters, the theory would be falsified. According to Popper, even though this theory was not conclusively falsified (which he recognized as impossible), we learn more from it than the thousand tests consistent with the theory.

To us, this is not the way social science is or should be conducted. After a thousand tests in favor and one against, even if the negative test seemed valid with a high degree of certainty, we would not drop the theory that campaigns have no effect. Instead, we might modify it to say perhaps that normal campaigns have no effect except when there is considerable evidence of immoral behavior by one of the candidates-but since this modification would make our theory more restrictive, we would need to evaluate it with a new set of data before being confident of its validity. The theory would still be very powerful, and we would know somewhat more about the bounds to which the theory applied with each pa.s.sing empirical evaluation. Each test of a theory affects both the estimate of its validity and the uncertainty of that estimate; and it may also affect to what extent we wish the theory to apply.

In the previous discussion, we suggested an important approach to theory, as well as issued a caution. The approach we recommended is one of sensitivity to the contingent nature of theories and hypotheses. Below, we argue for seeking broad application for our theories and hypotheses. This is a useful research strategy, but we ought always to remember that theories in the social sciences are unlikely to be universal in their applicability. Those theories that are put forward as applying to everything, everywhere-some versions of Marxism and rational choice theory are examples of theories that have been put forward with claims of such universality-are either presented in a tautological manner (in which case they are neither true nor false) or in a way that allows empirical disconfirmation (in which case we will find that they make incorrect predictions). Most useful social science theories are valid under particular conditions (in election campaigns without strong evidence of immoral behavior by a candidate) or in particular settings (in industrialized but not less industrialized nations, in House but not Senate campaigns). We should always try to specify the bounds of applicability of the theory or hypothesis. The next step is to raise the question: Why do these bounds exist? What is it about Senate races that invalidates generalizations that are true for House races? What is it about industrialization that changes the causal effects? What variable is missing from our a.n.a.lysis which could produce a more generally applicable theory? By asking such questions, we move beyond the boundaries of our theory or hypothesis to show what factors need to be considered to expand its scope.

But a note of caution must be added. We have suggested that the process of evaluating theories and hypotheses is a flexible one: particular empirical tests neither confirm nor disconfirm them once and for all. When an empirical test is inconsistent with our theoretically based expectations, we do not immediately throw out the theory. We may do various things: We may conclude that the evidence may have been poor due to chance alone; we may adjust what we consider to be the range of applicability of a theory or hypothesis even if it does not hold in a particular case and, through that adjustment, maintain our acceptance of the theory or hypothesis. Science proceeeds by such adjustments; but they can be dangerous. If we take them too far we make our theories and hypotheses invulnerable to disconfirmation. The lesson is that we must be very careful in adapting theories to be consistent with new evidence. We must avoid stretching the theory beyond all plausibility by adding numerous exceptions and special cases.

If our study disconfirms some aspect of a theory, we may choose to retain the theory but add an exception. Such a procedure is acceptable as long as we recognize the fact that we are reducing the claims we make for the theory. The theory, though, is less valuable since it explains less; in our terminology, we have less leverage over the problem we seek to understand.41 Furthermore, such an approach may yield a "theory" that is merely a useless hodgepodge of various exceptions and exclusions. At some point we must be willing to discard theories and hypotheses entirely. Too many exceptions, and the theory should be rejected. Thus, by itself, parsimony, the normative preference for theories with fewer parts, is not generally applicable. All we need is our more general notion of maximizing leverage, from which the idea of parsimony can be fully derived when it is useful. The idea that science is largely a process of explaining many phenomena with just a few makes clear that theories with fewer parts are not better or worse. To maximize leverage, we should attempt to formulate theories that explain as much as possible with as little as possible. Sometimes this formulation is achieved via parsimony, but sometimes not. We can conceiveof examples by which a slightly more complicated theory will explain vastly more of the world. In such a situation, we would surely use the nonparsimonious theory, since it maximizes leverage more than the more parsimonious theory.42