What Is This Thing Called Science - Part 7
Library

Part 7

An important aspect of the Bayesian theory of science is that the calculations of prior and posterior probabilities always take place against a background of a.s.sumptions that are taken for granted, that is, a.s.suming what Popper called background knowledge. So, for example, when it was suggested in the previous paragraph that P(e/h) takes the value 1 when e follows from h, it was taken for granted that h was to be taken in conjunction with the available background knowledge. We have seen in earlier chapters that theories need to be augmented by suitable auxiliary a.s.sumptions before they yield testable predictions. The Bayesians take these considerations on board. Throughout this discussion it is a.s.sumed that probabilities are calculated against a background of a.s.sumed knowledge.

It is important to clarify in what sense Bayes' theorem is indeed a theorem. Although we will not consider the details here, we note that there are some minimal a.s.sumptions about the nature of probability which taken together const.i.tute the so-called "probability calculus". These a.s.sumptions are accepted by Bayesians and non-Bayesians alike. It can be shown that denying them has a range of undesirable consequences. It can be shown, for example, that a gambling system that violates the probability calculus is "irrational" in the sense that it makes it possible for wagers to be placed on all possible outcomes of a game, race or whatever in such a way that the partic.i.p.ants on one or other side of the betting transaction will win whatever the outcome. (Systems of betting odds that allow this possibility are called Dutch Books. They violate the probability calculus.) Bayes' theorem can be derived from the premises that const.i.tute the probability calculus. In that sense, the theorem in itself is uncontentious.

So far, we have introduced Bayes' theorem, and have tried to indicate that the way in which it prescribes that the probability of a hypothesis be changed in the light of evidence captures some straightforwaidrintuitions about the bearing of evidence on theories. Now we must press the question of the interpretation of the probabilities involved more straggly,

Subjective Bayesianism.

The Bayesians disagree among themselves on a fundamental question concerning the nature of the probabilities involved. On one side of the division we have the "objective" Bayesians, According to them, the probabilities represent probabilities that rational agents ought to subscribe to in the light of the objective situation. Let me try to indicate the gist of their position with an example from horse racing. Suppose we are confronted by a list of the runners in a horse race and we are given no information about the horses at all. Then it might be argued that on the basis of some "princ.i.p.al of indifference" the only rational way of ascribing probabilities to the likelihood of each horse winning is to distribute the probabilities equally among the runners. Once we have these "objective" prior probabilities to start with, then Bayes' theorem dictates how the probabilities are to be modified in the light of any evidence, and so the posterior probabilities that result are also those that a rational agent ought to accept. A major, and notorious, problem with this approach, at least in the domain of science, concerns how to ascribe objective prior probabilities to hypotheses. What seems to be necessary is that we list all the possible hypotheses in some domain and distribute probabilities among them, perhaps ascribing the same probability to each employing the princ.i.p.al of indifference. But where is such a list to come from? It might well be thought that the number of possible hypotheses in any domain is infinite, which would yield zero for the probability of each and the Bayesian game cannot get started. Alt theories have zero probability and Popper wins the day. How is some finite list of hypotheses enabling some objective distribution of nonzero prior probabilities to be arrived at? My own view is that this problem is insuperable, and I also get the impression from the current literature that most Bayesians are themselves coming around to this point of view. So let us turn to "subjective" Bayesianism.

For the subjective Bayesian the probabilities to be handled by Bayes' theorem represent subjective degrees...o...b..lief. They argue that a consistent interpretation of probability theory can be developed on this basis, and, moreover, that it is an interpretation that can do full justice to science. Part of their rationale can be grasped by reference to the examples I invoked in the opening paragraph of this chapter. Whatever the strength of the arguments for attributing zero probability to all hypotheses and theories, it is simply not the case, argue the subjective Bayesians, that people in general and scientists in particular ascribe zero probabilities to well-confirmed theories. The fact that I pre-booked my trip to the mountains to observe Halley's comet suggests that they are right in my case at least. In their work, scientists take many laws for ranted. The unquestioning use of the law of refraction of light by astronomers and Newton's laws by those involved in the s.p.a.ce program demonstrates that they ascribe to those laws a probability close, if not equal, to unity. The subjective Bayesians simply take the degrees of belief in hypotheses that scientists as a matter of fact happen to have as the basis for the prior probabilities in their Bayesian calculations._ In this way they escape Popper's strictures to the effect that the probability of all universal hypotheses must be zero.

Bayesianism makes a great deal of sense in the context of gambling. We have noted that adherence to the probability calculus, within which Bayes' theorem can be proved, is a sufficient condition to avoid Dutch Books. Bayesian approaches to science capitalise on this by drawing a close a.n.a.logy between science and gambling systems. The degree of belief held by a scientist in a hypothesis is a.n.a.logous to the odds on a particular horse winning a race that he or she considers to be fair. Here there is a possible source of ambiguity that needs to be addressed. If we stick to our a.n.a.logy with horse racing, then the odds considered to be fair by punters can be taken as referring either to their private subjective degrees of belief or to their beliefs as expressed in practice in their betting behaviour. These are not necessarily the same thing. Punters can depart from the dictates of the odds they believe in by becoming fl.u.s.tered at the race-track or by losing their nerve when the system of odds they believe in warrant a particularly large bet. Not all Bayesians make the same choice between these alternatives when applying the Bayesian calculus to science. For example, Jon Dorling (1979) takes the probabilities to measure what is reflected in scientific practice and Howson and Urbach (1989) take them to measure subjective degrees of belief A diffidully with the former stand is knowing what it is within scientific practice that is meant to correspond to betting behaviour. Identifying the probabilities with subjective degrees of belief, as Howson and Urbach do, at least has the advantage of making it clear what the probabilities refer to.

Attempting to understand science and scientific reasoning in terms of the subjective beliefs of scientists would seem to be a disappointing departure for those who seek an objective account of science. Howson and Urbach have an answer to that charge. They insist that the Bayesian theory const.i.tutes an objective theory of scientific inference. That is, given a set of prior probabilities and some new evidence, Bayes' theorem dictates in an objective way what the new, posterior, probabilities must be in the light of that evidence. There is no difference in this respect between Bayesianism and deductive logic, because logic has nothing to say about the source of the propositions that const.i.tute the premises of a deduction either. It simply dictates what follows from those propositions once they are given. The Bayesian defence can be taken a stage further. It can be argued that the beliefs of individual scientists, however much they might differ at the outset, can be made to converge given the appropriate input of evidence. It is easy to see in an informal way how this can come about. Suppose two scientists start out by disagreeing greatly about the probable truth of hypothesis h which predicts otherwise unexpected experimental outcome e. The one who attributes a high probability to h will regard e as less unlikely than the one who attributes a low probability to h. So P(e) will be high for the former and low for the latter. Suppose now that e is experimentally confirmed. Each scientist will have to adjust the probabilities for h by the factor P(e/h)/P(e). However, since we are a.s.suming that e follows from h, P(e/h) is 1 and the scaling factor is 1/P(e). Consequently, the scientist who started with a low probability for h will scale up that probability by a larger factor than the scientist who started with a high probability for h. As more positive evidence comes in, the original doubter is forced to scale up the probability in such a way that it eventually approaches that of the already convinced scientist. In this kind of way, argue the Bayesians, widely differing subjective opinions can be brought into conformity in response to evidence in an objective way.

Applications of the Bayesian formula.

The preceding paragraph has given a strong foretaste of the kind of ways in which the Bayesians wish to capture and sanction typical modes of reasoning in science. In this section we will sample some more examples of Bayesianism in action.

In earlier chapters it was pointed out that there is a law of diminishing returns at work when testing a theory against experiment. Once a theory has been confirmed by an experiment once, repeating that same experiment under the same circ.u.mstances will not be taken by scientists as confirming the theory to as high a degree as the first experiment did. This is readily accounted for by the Bayesian. If the theory T predicts the experimental result E then the probability P(E/T) is 1, so that the factor by which the probability of T is to be increased in the light of a positive result E is 1/P(E). Each time the experiment is successfully performed, the more likely the scientist will be to expect it to be performed successfully again the subsequent time. That is, P(E) will increase. Consequently, the probability of the theory being correct will increase by a smaller amount on each repet.i.tion.

Other points in favour of the Bayesian approach can be made in the light of historical examples. Indeed, I suggest that it is the engagement by the Bayesians with historical cases in science that has been a key reason for the rising fortunes of their approach in recent years, a trend begun by Jon Dorling (1979). In our discussion of Lakatos's methodology we noted that according to that methodology it is the confirmations of a program that are important rather than the apparent falsifications, which can be blamed on the a.s.sumptions in the protective belt rather than on the hard core. The Bayesians claim to be able to capture the rationale for this strategy. Let us see how they do it, by looking at a historical example utilised by Howson and Urbach (1989, pp. 97-102).

The example concerns a hypothesis put forward by William Prout in 1815. Prout, impressed by the fact that atomic weights of the chemical elements relative to the atomic weight of hydrogen are in general close to whole numbers, conjectured that atoms of the elements are made up of whole numbers of hydrogen atoms. That is, Prout saw hydrogen atoms as playing the role of elementary building blocks. The question at issue is what the rational response was for Prout and his followers to the finding that the atomic weight of chlorine relative to hydrogen (as measured in 1815) was 35.83, that is, not a whole number. The Bayesian strategy is to a.s.sign probabilities that reflect the prior probabilities that Prout and his followers might well have a.s.signed to their theory together with relevant aspects of background knowledge, and then use Bayes' theorem to calculate how these probabilities change in light of the discovery of the problematic evidence, namely the non-integral value for the atomic weight of chlorine. Howson and Urbach attempt to show that when this is done the result is that the probability of Prout's hypothesis falls just a little, whereas the probability of the relevant measurements being accurate falls dramatically. In light of this it seems quite reasonable for Prout to have retained his hypothesis (the hard core) and to have put the blame on some aspect of the measuring process (the protective belt). It would seem that a clear rationale has been given for what in Lakatos's methodology appeared as "methodological decisions" that were not given any grounding. What is more, it would seem that Howson and Urbach, who are following the lead of Dorling here, have given a general solution to the so-called "Duhem-Quine problem". Confronted with the problem of which part of a web of a.s.sumptions to blame for an apparent falsification, the Bayesian answer is to feed in the appropriate prior probabilities and calculate the posterior probabilities. These will show which a.s.sumptions slump to a low probability, and consequently which a.s.sumptions should be dropped to maximise the chances of future success.

I will not go through the details of the calculations in the Prout case, or any of the other examples that Bayesians have given, but I will say enough to at least give the flavour of the way in which they proceed. Prout's hypothesis, h, and the effect of the evidence e, the non-integral atomic weight of chlorine, on the probability to be a.s.signed to it is to be judged in the context of the available background knowledge, a. The most relevant aspect of the background knowledge is the confidence to be placed in the available techniques for measuring atomic weights and the degree of purity of the chemicals involved. Estimates need to be made about the prior probabilities of h, a and e. Howson and Urbach suggest a value of 0.9 for P(h), basing their estimate on historical evidence to the effect that the Proutians were very convinced of the truth of their hypothesis. They place P(a) somewhat lower at 0.6, on the grounds that chemists were aware of the problem of impurities, and that there were variations in the results of different measurements of the atomic weight of particular elements. The probability P(e) is a.s.sessed on the a.s.sumption that the alternative to h is a random distribution of atomic weights, so, for instance, P(e/not h & a) is ascribed a probability 0.01 on the grounds that, if the atomic weight of chlorine is randomly distributed over a unit interval it would have a one in a hundred chance of being 35.83. These probability estimates, and a few others like them, are fed into Bayes theorem to yield posterior probabilities, P(h/e) and P(a/e), for h and a. The result is 0.878 for the former and 0.073 for the latter. Note that the probability for h, Prout's hypothe sis, has fallen only a small amount from the original 0.9, whereas the probability of a, the a.s.sumption that the meas urements are reliable, has fallen dramatically from 0.6 to 0.073. A reasonable response for the Proutians, conclude Howson and Urbach, was to retain their hypothesis and doubt the measurements. They point out that nothing much hinges on the absolute value of the numbers that are fed into the calculation so long as they are of the right kind of order to reflect the att.i.tudes of the Proutians as reflected in the historical literature.

The Bayesian approach can be used to mount a criticism-of some of the standard accounts of the undesirability of ad hoc hypotheses and related issues. Earlier in this book I proposed the idea, following Popper, that ad hoc hypotheses are undesirable because they are not testable independently of the evidence that led to their formulation. A related idea is that evidence that is used to construct a theory cannot be used again as evidence for that theory. From the Bayesian point of view, although these notions sometimes yield appropriate answers concerning how well theories are confirmed by evidence, they also go astray, and, what is more, the rationale underlying them is misconceived. The Bayesians attempt to do better in the following kinds of ways.

Bayesians agree with the widely held view that a theory is better confirmed by a variety of kinds of evidence than by evidence of a particular kind. There is a straightforward Bayesian rationale that explains why this should be so. The point is that there are diminishing returns from efforts to confirm a theory by a single kind of evidence. This follows from the fact that each time the theory is confirmed by that kind of evidence, then the probability expressing the degree of belief that it will do so in the future gradually increases. By contrast, the prior probability of a theory being confirmed by some new kind of evidence may be quite low. In such cases, feeding the results of such a confirmation, once it occurs, into the Bayesian formula leads to a significant increase in the probability ascribed to the theory. So the significance of independent evidence is not in dispute. Nevertheless, Howson and Urbach urge that, from the Bayesian point of view, if hypotheses are to be dismissed as ad hoc, the absence of independent testability is not the right reason for doing so. What is more, they deny that data used in the construction of a theory cannot be used to confirm it.

A major difficulty with the attempt to rule out ad hoc hypotheses by the demand for independent testability is that it is too weak, and admits hypotheses in a way that at least clashes with our intuitions. For instance, let us consider the attempt by Galileo's rival to retain his a.s.sumption that the moon is spherical in the face of Galileo's sightings of its moons and craters by proposing the existence of a transparent, crystalline substance enclosing the observable moon. This adjustment cannot be ruled out by the independent testability criterion because it was independently testable, as evidenced by the fact that it has been refuted by the lack of interference from any such crystalline spheres experienced during the various moon landings. Greg Bamford (1993) has raised this, and a range of other difficulties with a wide range of attempts to define the notion of ad hocness by philosophers in the Popperian tradition, and suggests that they are attempting to define a technical notion for what is in effect nothing more than a common sense idea. Although Bamford's critique is not from a Bayesian point of view, the response of Howson and Urbach is similar, insofar as their view is that ad hoc hypotheses are rejected simply because they are considered implausible, and are credited with a low probability because of this. Suppose a theory t has run into trouble with some problematic evidence and is modified by adding a.s.sumption a, so that the new theory, t, is (t &a;). Then it is a straightforward result of probability theory that P(t & a) cannot be greater than P(a). From a Bayesian point of view, then, the modified theory will be given a low probability simply on the grounds that P(a) is unlikely. The theory of Galileo's rival could be rejected to the extent that his suggestion was implausible. There is nothing more to it, and nothing else needed.

Let us now turn to the case of the use of data to construct a theory and the denial that that data can be considered to support it. Howson and Urbach (1989, pp. 275-80) give counter examples. Consider an urn containing counters, and imagine that we begin with the a.s.sumption that all of the counters are white and none of them coloured. Suppose we now draw counters 1,000 times, replacing the counter and shaking the urn after each draw, and that the result is that 495 of the counters are white. We now adjust our hypothesis to be that the urn contains white and coloured counters in equal numbers. Is this adjusted hypothesis supported by the evidence used to arrive at the revised, equal numbers, hypothesis? Howson and Urbach suggest, reasonably, that it is, and show why this is so on Bayesian grounds. The crucial factor that leads to the probability of the equal numbers hypothesis increasing as a result of the experiment that drew 495 white counters is the probability of drawing that number if the equal numbers hypothesis is false. Once it is agreed that that probability is small, the result that the experiment confirms the equal numbers hypothesis follows straightforwardly from the Bayesian calculus, even though the hypothesis was used in the construction of the data.

There is a standard criticism often levelled at the Bayesian approach that does strike at some versions of it, but I think the version defended by Howson and Urbach can counter it. To utilise Bayes' theorem it is necessary to be able to evaluate P(e), the prior probability of some evidence that is being considered. In a context where hypothesis h is being considered, it is convenient to write P(e) as P(e/h).P(h) + P(e/not h).P(not h), a straightforward ident.i.ty in probability theory The Bayesian needs to be able to estimate the probability of the evidence a.s.suming the hypothesis is true, which may well be unity if the evidence follows from the hypothesis, but also the probability of the evidence should the hypothesis be false. It is this latter factor that is the problematic one. It would appear that it is necessary to estimate the likelihood of the evidence in the light of all hypotheses other than h. This is seen as a major obstacle, because no particular scientist can be in a position to know all possible alternatives to h, especially if, as some have suggested, this must include all hypotheses not yet invented. The response open to Howson and Urbach is to insist that the probabilities in their Bayesian calculus represent personal probabilities, that is, the probabilities that individuals, as a matter of fact, attribute to various propositions. The value of the probability of some evidence being true in the light of alternatives to h will be decided on by a scientist in the light of what that scientist happens to know (which will certainly exclude hypotheses not yet invented). So, for instance, when dealing with the Prout case, Howson and Urbach take the only alternative to Prout's hypothesis to be the hypothesis that atomic weights are randomly distributed on the basis of historical evidence to the effect that that is what the Proutians believed to be the alternative. It is the thoroughgoing nature of their move to subjective probabilities that makes it possible for Howson and Urbach to avoid the particular problem raised here.

In my portrayal of the elements of the Bayesian a.n.a.lysis of science, I have concentrated mainly on the position outlined by Howson and Urbach because it seems to me to be the one most free of inconsistencies. Because of the way in which probabilities are interpreted in terms of degrees of the beliefs actually held by scientists, their system enables non-zero probabilities to be attributed to theories and hypotheses, it gives a precise account of how the probabilities are to be modified in the light of evidence, and it is able to give a rationale for what many take to be key features of scientific method. Howson and Urbach embellish their system with historical case studies.

Critique of subjective Bayesianism.

As we have seen, subjective Bayesianism, the view that consistently understands probabilities as the degrees of belief actually held by scientists, has the advantage that it is able to avoid many of the problems that beset alternative Bayesian accounts that seek for objective probabilities of some kind. For manyi to embrace subjective probabilities is to pay too high a price for the luxury of being able to attribute probabilities to theories. Once we take probabilities as subjective degrees of belief to the extent that Howson and Urbach, for example, urge that we do, then a range of unfortunate consequences follow.

The Bayesian calculus is portrayed as an objective mode of inference that serves to transform prior probabilities into posterior probabilities in the light of given evidence. Once we see things in this way, it follows that any disagreements in science, between proponents of rival research programs, para, dims or whatever, reflected in the (posterior) beliefs of entists, must have their source in the prior probabilities held by the scientists, since the evidence is taken as given and the infei=i--- considered to be objective. But the prior prob abilities are themselves totally subjectilie- and not subject to a critical a.n.a.lysis. They simply reflect the various degrees of belief each individual scientist happens to have. Consequently, those of us who raise questions about the relative merits of competing theories and about the sense in which science can be said to progress will not have our questions answered by the subjective Bayesian, unless we are satisfied with an answer that refers to the beliefs that individual scientists just happen to have started out with.

If subjective Bayesianism is the key to understanding science and its history then one of the most important sources of information that we need to have access to in order to acquire that understanding are the degrees of belief that scientists actually do or did hold. (The other source of information is the evidence, which is discussed below.) So, for instance, an understanding of the superiority of the wave theory over the particle theory of light will require some knowledge of the degrees of belief that Fresnel and Poisson, for instance, brought to the debate in the early 1830s. There are two problems here. One is the problem of gaining access to a knowledge of these private degrees of belief. (Recall that Howson and Urbach distinguish between private beliefs and actions and insist that it is the former with which their theory deals, so we cannot infer beliefs of scientists from what they do, or even write.) The second problem is the implausibility of the idea that we need to gain access to these private beliefs in order to grasp the sense in which, say, the wave theory of light was an improvement on its predecessor. The problem is intensified when we focus on the degree of complexity of modern science, and the extent to which it involves collaborative work. (Recall my comparison with workers constructing a cathedral in chapter 8.) An extreme, and telling, example is provided by Peter Galison's (1997) account of the nature of the work in current fundamental particle physics, where very abstruse mathematical theories are brought to bear on the world via experimental work that involves elaborate computer techniques and instrumentation that requires state-of-the-art engineering for its operation. In situations like this there is no single person who grasps all aspects of this complex work. The theoretical physicist, the computer programmer, the mechanical engineer and the experimental physicist all have their separate skills which are brought to bear on a collaborative enterprise. If the progressiveness of this enterprise is to be understood as focusing on degrees of belief, then whose degree of belief do we choose and why?

The extent to which degrees of belief are dependent on prior probabilities in Howson and Urbach's a.n.a.lysis is the source of another problem. It would seem that, provided a scientist believes strongly enough in his or her theory to begin with (and there is nothing in subjective Bayesianism to prevent degrees of belief as strong as one might wish), then this belief cannot be shaken by any evidence to the contrary, however strong or extensive it might be. This point is in fact ill.u.s.trated by the Prout study, the very study that Howson and Urbach use to support their position. Recall that in that study we a.s.sume that the Proutians began with a prior probability of 0.9 for their theory that atomic weights are equal multiples of the atomic weight of hydrogen and a prior probability of 0.6 for the a.s.sumption that atomic weight measurements are reasonably accurate reflections of actual atomic weights. The posterior probabilities, calculated in the light of the 35.83 value obtained for chlorine, were 0.878 for Prout's theory and 0.073 for the a.s.sumption that the experiments are reliable. So the Proutians were right to stick to their theory and reject the evidence. I point out here that the original incentive behind Prout's hypothesis was the near integral values of a range of atomic weights other than chlorine, measured by the very techniques which the Proutians have come to regard as so unreliable that they warrant a probability as low as 0.073! Does this not show that if scientists are dogmatic enough to begin with they can offset any adverse evidence? Insofar as it does, there is no way that the subjective Bayesian can identify such activity as bad scientific practice. The prior probabilities cannot be judged. They must be taken as simply given. As Howson and Urbach (1989, p. 273) themselves stress, they are "under no obligation to legislate concerning the methods people adopt for a.s.signing prior probabilities".

Bayesians seem to have a counter to the Popperian claim that the probability of all theories must be zero, insofar as they identify probabilities with the degrees of belief that scientists happen, as a matter of fact, to possess. However, the Bayesian position is not that simple. For it is necessary for the Bayesians to ascribe probabilities that are counterfactual, and so cannot be simply identified with degrees of belief actually held. Let us take the problem of how past evidence is to count for a theory as an example. How can the observations of Mercury's...o...b..t be taken as confirmation of Einstein's theory of general relativity, given that the observations preceded the theory by a number of decades? To calculate the probability of Einstein's theory in the light of this evidence the subjective Bayesian is required, among other things, to provide a measure for the probability an Einstein supporter would have given to the probability of Mercury's...o...b..t precessing in the way that it does without a knowledge of Ein stein's theory. That probability is not a measure of the degree of belief that a scientist actually has but a measure of a degree of belief they would have had if they did not know what they in fact do know. The status of these degrees of belief, and the problem of how one is to evaluate them, pose serious problems, to put it mildly.

Let us now turn to the nature of "evidence" as it figures in subjective Bayesianism. We have treated the evidence as a given, something that is fed into Bayes' theorem to convert prior probabilities to posterior probabilities. However, as the discussion of the early chapters of this book should have made clear, evidence in science is far from being straightforwardly given. The stand taken by Howson and Urbach (1989, p. 272) is explicit and totally in keeping with their overall approach.

The Bayesian theory we are proposing is a theory of inference from data; we say nothing about whether it is correct to accept the data, or even whether your commitment to the data is absolute. It may not be, and you may be foolish to repose in it the confidence you actually do. The Bayesian theory of support is a theory of how the acceptance as true of some evidential statement affects your belief in some hypothesis. How you come to accept the truth of the evidence and whether you are correct in accepting it as true are matters which, from the point of view of the theory, are simply irrelevant.

Surely this is a totally unacceptable position for those who purport to be writing a book on scientific reasoning. For is it not the case that we seek an account of what counts as appropriate evidence in science? Certainly a scientist will respond to some evidential claim, not by asking the scientist making the claim how strongly he or she believes it, but by seeking information on the nature of the experiment that yielded the evidence, what precautions were taken, how errors were estimated and so on. A good theory of scientific method will surely be required to give an account of the circ.u.mstances under which evidence can be regarded as adequate, and be in a position to pinpoint standards that empirical work in science should live up to. Certainly experimental scientists have plenty of ways of rejecting shoddy work, and not by appealing to subjective degrees of belief.

Especially when they are responding to criticism, Howson and Urbach stress the extent to which both the prior probabilities and the evidence which need to be fed into Bayes' theorem are subjective degrees of belief about which the subjective Bayesian has nothing to say. But to what extent can what remains of their position be called a theory of scientific method? All that remains is a theorem of the probability calculus. Suppose we concede to Howson and Urbach that this theorem, as interpreted by them, is indeed a theorem with a status akin to deductive logic. Then this generous concession serves to bring out the limitation of their position. Their theory of scientific method tells us as much about science as the observation that science adheres to the dictates of deductive logic. The vast majority, at least, of philosophers of science would have no problem accepting that science takes deductive logic for granted, but would wish to be told much more.

Further reading.

Dorling (1979) was an influential paper that put subjective Bayesianism on its modern trend, and Howson and Urbach (1989) is a sustained and unabashed case for it. Horwich (1982) is another attempt to understand science in terms of subjective probability. Rosenkrantz (1977) is an attempt to develop a Bayesian account of science involving objective probabilities. Earman (1992) is a critical, but technical, defence of the Bayesian program. Mayo (1996) contains a sustained critique of Bayesianism.

CHAPTER 13:.

The new experimentalism.

Introduction.

If we regard the Bayesian account of scientific inference as a failure, we still have not provided much by way of some characterisation of what it is that is distinctive about scientific knowledge. Popper posed problems for positivism and inductivism by stressing the theory-dependence of observation and the extent to which theories always transcend, and so can never be derived from, the evidence. Popper's accoun of science was based on the idea that theories are those thats urvive the severest tests. However, his account was unable to give clear guidance when a theory, rather than some element of background knowledge, should be hekd responsible for a failed test, and was unable to say something sufficiently positive about theories that happen to have survived tests. The subsequent attempts that we discussed all involved taking the idea of theory-dependence further than Popper did. Lakatos introduced research programs, and saw them retained or rejected according to conventional decisions, decisions, for example, to blame auxiliary a.s.sumptions rather than hard-core principles for apparent falsifications. However, he was unable to give grounds for those decisions, and in any case they were too weak to specify when it was time to abandon a research program in favour of another. Kuhn introduced paradigms rather than research programs thus introducing a degree of paradigm-dependence in science that was more far-reaching than Popper's theory-dependence, so much so that Kuhn was even worse off than Lakatos in giving a clear answer to the question of the sense in which a paradigm could be said to be an improvement on the one it replaced. Feyerabend can be seen as taking the theory-dependence movement to its extreme, giving up on the idea of special methods and standards for science altogether, and joining Kuhn in the portrayal of rival theories as incommen surable. The Bayesians can also be seen as part of what I am calling the theory-dependence tradition. For them the background theoretical a.s.sumptions that inform the judgments about the merits of scientific theories are brought in by way of the prior probabilities.

For one group of philosophers, the range of problems that beset contemporary philosophy of science are to be confronted by tackling the move towards radical theory-dependence at its source. Although they do not wish to return to the positivist idea that the senses provide an unproblematic basis for science, they do seek a relatively secure basis for science, not in observation but in experiment. I shall follow Robert Ackermann (1989) and refer to this recent trend as "the new experimentalism". According to its proponents, experiment can, in the words of Ian Hacking (1983, p. vii) have a life of its own" independent of large-scale theory. It is argued that experimentalists have a range of practical strategies for establishing the reality of experimental effects without needing recourse to large-scale theory. What is more, if scientific progress is seen as the steady build up of the stock of experimental knowledge, then the idea of c.u.mulative progress in science can be reinstated and is not threatened by claims to the effect that there are scientific revolutions involving large-scale theory change.

Experiment with life of its own.

We begin this section with a historical story, drawing heavily on Gooding (1990). Late in the summer of 1820 reports reached Britain of Oersted's finding that the magnetic effect of a current-carrying wire in some way circulates the wire. Faraday undertook experimental work to clarify what this claim amounted to and to develop it further. Within a few months he had constructed what was, in effect, a primitive electric motor. A cylindrical gla.s.s tube was sealed by corks, top and bottom. A wire running through the centre of the top cork into the cylinder ended in a hook from which a second wire hung vertically. Its lower end was free to rotate around the tip of a soft iron cylinder protruding into the base of the cylinder via the bottom cork. Electrical contact between the lower tip of the dangling wire and the iron core was maintained via a pool of mercury resting on the lower cork. To activate this "motor", one pole of a bar magnet was held adjacent to the end of the iron core emerging from the lower cork, while a conducting wire connected the iron core to the wire emerging from the top cork via an electric cell. The ensuing current caused the lower tip of the dangling wire to rotate around the magnetised iron core, maintaining contact with the mercury as it did so. Faraday promptly sent a sample of this device to his rivals around Europe, complete with instructions on how to make it work. He pointed out to them that they could reverse the direction of the rotation either by reversing the connections to the battery or by reversing the magnet.

Is it useful or appropriate to regard this accomplishment of Faraday's as theory-dependent and fallible? It can be said to be theory-dependent in a very weak sense. Faraday's rivals on the Continent would not have been able to follow his instructions if they did not know what a magnet, mercury and an electric cell were. But this amounts to no more than a refutation of the extreme empiricist idea that facts must be established directly by the entry of sensory data into a mind that otherwise knows nothing. n.o.body need deny the claim that someone who cannot tell the difference between a magnet and a carrot is not in a position to appreciate what counts as an established fact in electromagnetism. It is surely injudicious to use the term "theory" in such a general sense that "carrots are not magnets" becomes a theory. What is more, construing all talk as "theory dependent" does not help get to grips with the genuine differences between the likes of Faraday and Ampere. Faraday, as is well known, sought to understand electric and magnetic phenomena in terms of lines of force emanating from electrically charged bodies and magnets and filling the s.p.a.ce around them, while theorists on the Continent thought of electric fluids residing in insulators and flowing through conductors, with elements of fluid acting on each other at a distance. These were the theories at stake, and the appreciation of Faraday's motor effect was not "theory dependent" in the sense that an appreciation of it depended on the acceptance of or familiarity with some version of one of the rival theories. Within electromagnetism at the time Faraday's motor const.i.tuted an experimentally established theory-neutral effect which all electromagnetic theories were obliged to take account of.

Nor is it helpful to regard Faraday's motor effect as fallible. It is true that Faraday's motors sometimes do not work, because the magnet is too weak or because the-wire is immersed so far into the mercury that the latter offers too resistance to rotation, or whatever, Consequently, the statement "all wires situated in an experimental arrangement meeting Faraday's description rotate" is false. But this simply indicates that attempting to capture the essence of Faraday's discovery with universal statements of this kind is inappropriate. Faraday discovered a new experimental effect, demonstrated it by constructing a version of his device that did work, and gave instructions to his rivals that enabled them to build devices that worked too. The odd failure is neither surprising nor relevant. The theoretical explanation of Faraday's motor that would be accepted today differs from that offered by both Faraday and Ampere in significant respects. But it remains the case that Faraday's motors usually work. It is difficult to comprehend how future advances in theory could somehow lead to the conclusion that electric motors don't work (although they might well be rendered obsolete by future discoveries of yet other experimental effects). Looked at in this way, experimental effects that can be produced in a controlled way are not fallible, they are here for keeps. What is more, if we understand progress in science in terms of the acc.u.mulation of such effects, then we have a theory-independent understanding of its growth.

A second example supports further this way of looking at things. Jed Buchwald's (1989) detailed study of the experimental career of Heinrich Hertz indicates the extent to which Hertz aimed to produce novel experimental effects. Some of his claims to have done so did not meet with general accep tance. It is not difficult to appreciate why. Hertz had learnt his electromagnetism through Helmholtz and saw things in terms of Helmholtz's theoretical framework, which was just one of the several theoretical approaches to electromagnetism at the time (the chief alternatives being those of Weber and Maxwell). That the experimental findings of Hertz const.i.tuted novel effects could only be appreciated and defended if the fine details of the theoretical interpretation Hertz brought to his experiments were appreciated and defended. These results were highly theory-dependent, and this, a new experimentalist might well argue, is precisely why they were not generally accepted as const.i.tuting novel effects. Things were quite otherwise once Hertz had produced his electric waves. That there were such waves could be demonstrated in a way that was independent of which general theory was subscribed to. Hertz was able to exhibit this new effect in a controlled way. He set up standing waves and showed that small spark detectors showed maximum sparking at the antinodes and no sparking at the nodes of these waves. This was by no means easily achieved, nor were the results easily reproduced, as Buchwald found when he tried it. But I am not claiming the experiments were easy. I am simply claiming that the fact that the experiments demonstrated the existence of a new experimentally produced phenomenon could be appreciated in a way that did not rely on recourse to one or other of the competing electromagnetic theories, a claim borne out by the rapidity with which Hertz's waves were accepted by all camps.

The production of controlled experimental effects can be accomplished and appreciated independently of high-level theory then. In a similar vein, the new experimentalist can point to a range of strategies available to experimenters for establishing their claims that do not involve appeal to high-level theory. Let us consider, for example, how an experimentalist might argue that a particular observation by way of an instrument represents something real rather than an artifact. Ian Hacking's (1983, pp. 186-209) stories concerning the use of microscopes ill.u.s.trate the point well. A miniature grid, with labelled squares is etched on a piece of gla.s.s which is then photographically reduced to such an extent that the grid becomes invisible. The reduced grid is viewed through a microscope that reveals the grid, complete with labelled squares. This already is a strong indication that the microscope magnifies, and magnifies reliably - an argument, incidentally, that does not rely on a theory of how the microscope works. We now reflect on a biologist who is using an electron microscope to view red blood platelets mounted on our grid. (Here Hacking is reporting an actual sequence of affairs reported to him by a scientist.) Some dense bodies are observable within the cell. The scientist wonders if the bodies are present in the blood or are artifacts of the instrument. (He suspects the latter.) He notes which of the labelled squares on the grid contain these dense bodies. Next he views his sample through a fluorescence microscope. The same bodies appear once again, in the same locations on the grid. Can there be any doubt that what is being observed represents bodies in the blood rather than artifacts. All that is required to render this argument persuasive is the knowledge that the two microscopes work on quite different physical principles, so that the chance of both of them producing identical artfacts can be recognised as highly improbable. The argument does not require detailed theoretical knowledge of the workings of either instrument.

Deborah Mayo on severe experimental testing Deborah Mayo (1996) is a philosopher of science who has attempted to capture the implications of the new experimentalism in a philosophically rigorous way. Mayo focuses on the detailed way in which claims are validated by experiment, and is concerned with identifying just what claims are borne out and how A key idea underlying her treatment is that a claim can only be said to be supported by experiment if the various ways in which the claim could be at fault have been investigated and eliminated. A claim can only be said to be borne out by experiment if it has been severely tested by experiment, and a severe test of a claim, as usefully construed by Mayo, must be such that the claim would be unlikely to pa.s.s it if it were false.

Her idea can be ill.u.s.trated by some simple examples. Suppose Snell's law of refraction of light is tested by some very rough experiments in which very large margins of error are attributed to the measurements of angles of incidence and refraction, and suppose that the results are shown to be compatible with the law within those margins of error. Has the law been supported by experiments that have severely tested it? From Mayo's perspective the answer is "no" because, owing to the roughness of the measurements, the law of refraction would be quite likely to pa.s.s this test even if it were false and some other law differing not too much from Snell's law true. An exercise I carried out in my schoolteaching days serves to drive this point home. My students had conducted some not very careful experiments to test Snell's law. I then presented them with some alternative laws of refraction that had been suggested in Antiquity and medieval times, prior to the discovery of Snell's law, and invited the students to test them with the measurements they had used to test Snell's law. Because of the wide margins of error they had attributed to their measurements, all of these alternative laws pa.s.sed the test. This clearly brings out the point that the experiments in question did not const.i.tute a severe test of Snell's law. That law would have pa.s.sed the test even if it were false and one of the historical alternatives true.

A second example further ill.u.s.trates the rationale behind Mayo's position. I had two cups of coffee this morning and this afternoon I have a headache. Is the claim "my morning coffee caused me to have a headache" thereby confirmed? Mayo's position captures the reason why the answer is "no", Before the claim can be said to have been severely tested, and so confirmed, we must eliminate the various ways in which the claim could be in error. Perhaps my headache is due to the particularly strong Vietnamese beer I drank last night, to the fact that I got up too early, that I am finding this section particularly difficult to write, and so on. If some causal connection between coffee drinking and headaches is to be established then it will be necessary to conduct controlled experiments that will serve to eliminate other possible causes. We must seek to establish results that would be most unlikely to occur unless coffee does indeed cause headaches. An experiment const.i.tutes support for a claim only if possible sources of error have been eliminated, and so the claim would be unlikely to pa.s.s the test unless it were true. This simple idea serves to capture some common intuitions about experimental reasoning in a neat way and is also extended by Mayo to offer some fresh insights.

Let us consider the so-called "tacking paradox" which I ill.u.s.trate with an example. Let us imagine Newton's theory, T, has been confirmed by carefully observing the motion of a comet, with care being taken to eliminate sources of error due to attraction from nearby planets, refraction in the earth's atmosphere and so on. Suppose that we now construct theory T' by tacking a statement such as "emeralds are green" onto Newton's theory. Is T' confii flied by the observations of the comet? If we hold the view that a prediction, p, confirms a theory if p follows from the theory and is confirmed by experiment, then T' (and a vast number of similarly constructed theories) is confirmed by the observations in question, counter to our intuitions. Hence the "tacking paradox". However, T' is not confirmed from Mayo's point of view and the "paradox" is dissolved. Given our a.s.sumptions about the elimination of possible sources of error, we can say that the orbit of the comet would be unlikely to have conformed to the Newtonian prediction unless Newton's theory were true. The same cannot be said about T' because the likelihood of the comet conforming to the Newtonian prediction would be totally unaltered if some emeralds were blue and hence T' false. T' is not confirmed by the experiment in question because that experiment does not probe the various ways in which "emeralds are green" might be false. Observations of comets can severely test T but not T'.

Mayo extends this line of reasoning to less trivial cases. She is keen to keep theoretical speculation in check by identifying theoretical conclusions that go further beyond the experimental evidence than is warranted. Her a.n.a.lysis of Eddington's test of Einstein's prediction of the bending of light in a gravitational field ill.u.s.trates the point.

Eddington took advantage of an eclipse of the sun to observe the relative position of stars in a situation where the light from them pa.s.sed close to the sun on their pa.s.sage to earth. He compared these relative positions with those observed later in the year, when the stars were no longer closely aligned with the sun. A measurable difference was detected. By looking at the details of the eclipse experiments Mayo is able to argue that Einstein's law of gravity, which is a consequence of his general theory of relativity, was confirmed by them, but the general theory of relativity itself was not. Let us see how she does so.

If the results of the eclipse experiments are to be taken as confirming the general theory of relativity, then it must be possible to argue that those results would be most unlikely to occur if the general theory is false. We must be able to eliminate erroneous links between the general theory and the results. This could not be done in the case in question because there are, as a matter of fact, a whole cla.s.s of theories of s.p.a.ce-time of which Einstein's theory is only one, all of which predict Einstein's law of gravity and hence the results of the eclipse experiments. If one of this cla.s.s of theories other than Einstein's were true, and Einstein's false, exactly the same results of the eclipse experiments would be expected. Consequently, those experiments did not const.i.tute a severe test of Einstein's general theory. They did not serve to distinguish between it and known alternatives. To claim that the eclipse experiments supported Einstein's general theory of relativity is to go further beyond the experimental evidence than is warranted.

The situation is different when we consider the more restricted claim that the eclipse experiments confirmed Einstein's law of gravity. The observations certainly were in conformity with that law, but before it is legitimate to take this as evidence for the law, we must eliminate other possible causes of the conformity. It is only then we can say that the observed displacements would not have occurred unless Einstein's law is true. Mayo shows in some detail how alternatives to Einstein's law, including Newtonian alternatives arising from an inverse square law attraction between the sun and photons presumed to have ma.s.s, were considered and eliminated. Einstein's law of gravity was severely tested by the eclipse experiments in a way that the general theory of relativity was not.

The new experimentalists are generally concerned to capture a domain of experimental knowledge that can be reliably established independent of high-level theory Mayo's position meshes well with that aspiration. From her perspective, experimental laws can be confirmed by severely testing them along the lines discussed above. The growth of scientific knowledge is to be understood as the acc.u.mulation and extension of such laws.

Learning from error and triggering revolutions.

Experimental results confirm a claim when they can be argued to be free from error, and when the results would be unlikely if the claim were false. However, there is more to Mayo's focus on the importance of experimental error than this. She is concerned with how well-conducted experiments enable us to learn from error. Looked at from this point of view, an experiment that serves to detect an error in some previously accepted a.s.sertion serves a positive as well as a negative function. That is, it not only serves as a falsification of the a.s.sertion, but also positively identifies an effect not previously known. The positive role of error detection in science is well ill.u.s.trated by Mayo's reformulation of Kuhn's notion of normal science.

Let us recall our account, in chapter 8, of the conflicting answers given by Popper and Kuhn to the question of why astrology fails to qualify as a science. According to Popper, astrology is not a science because it is unfalsifiable. Kuhn points out that this is inadequate because astrology was (and is) falsifiable. In the sixteenth and seventeenth centuries, when astrology was "respectable", astrologers did make testable predictions, many of which turned out to be false. Scientific theories make predictions that turn out to be false too. The difference, according to Kuhn, is that science is in a position to learn constructively from the "falsifications", whereas astrology was not. For Kuhn, there exists in normal science * le-solving tradition that astrology lacked. There is more to science than the falsification of theories. There is also the way in which falsifications are constructively overcome. It is ironic, from this point of view, that Popper, who at times characterised his own approach with the slogan "we learn from our mistakes", failed precisely because his nega tive, falsificationist account did not capture an adequate, positive account of how science learns from mistakes (falsifications).

Mayo sides with Kuhn here, identifying normal science with experimentation. Let us note some examples of the positive role played by error detection. The observation of the problematic features of Ura.n.u.s's...o...b..t posed problems for Newtonian theory in conjunction with the background knowledge of the time. But the positive side of the problem was the extent to which the source of the trouble could be traced, leading to the discovery of Neptune in the way we have already described. Another episode we have mentioned before concerns Hertz's experiments on cathode rays, which led him to conclude that they are not deflected by an electric field. J. J. Thomson was able to show that he was in error, in part by appreciating the extent to which the rays ionise the residual gas in discharge tubes, leading to a build-up of charged ions on electrodes and the formation of electric fields. By achieving lower pressures in his tubes and arranging his electrodes more appropriately, Thomson detected the influence of electric fields on cathode rays that Hertz had missed. But he had also learnt something about new effects concerning ionisation and the build-up of s.p.a.ce charge. In the context of the deflection experiments these const.i.tuted impediments to be removed. However, they also turned out to be important in their own right. The ionisation of gases by the pa.s.sage of charged particles through them was to be fundamental for the study of charged particles in cloud chambers. The experimentalist's detailed knowledge of the effects at work in an apparatus puts him or her in a position to be able to learn from error.

Mayo does more than simply translate Kuhn's notion of normal science into experimental practice. She points to the way in which the facility of experiment to detect and accommodate error can prove sufficient to trigger or contribute to a scientific revolution, a decidedly unKuhnian thesis. Mayo's best example concerns the experiments on Brownian Motion conducted by Jean Perrin towards the end of the first decade of this century. Perrin's detailed, ingenious, down-to-earth observations of the motions of Brownian particles established beyond reasonable doubt that their motion was random. This, together with observations of the variation of the density of the distribution of particles with height, enabled Perrin to show as conclusively as one could wish that the motion of the particles violate the second law of thermodynamics as well as conforming to detailed predictions of the kinetic theory You can't get much more revolutionary than that. A similar story could be told about the way in which experimental investigations of black body radiation, radioactive decay and the photoelectric effect, for instance, forced an abandonment of cla.s.sical physics and const.i.tuted important elements of the new quantum theory in the early decades of the twentieth century.

Implicit in the new experimentalist's approach is the denial that experimental results are invariably "theory" or "paradigm" dependent to the extent that they cannot be appealed to to adjudicate between theories. The reasonableness of this stems from the focus on experimental practice, on how instruments are used, errors eliminated, cross-checks devised and specimens manipulated. It is the extent to which this experimental life is sustained in a way that is independent of speculative theory that enables the products of that life to act as major constraints on theory. Scientific revolutions can be "rational" to the extent that they are forced on us by experimental results. The extremes of the theory- or paradigm-dominated views of science have lost touch with, and cannot make sense of, one of its most distinctive components, experimentation.

The new experimentalism in perspective.

The new experimentalists have shown how experimental results can be substantiated and experimental effects produced by an array of strategies involving practical interventions, cross-checking and error control and elimination in a way that can be, and typically is, independent of high-level theory. As a consequence of this, they are able give an account of progress in science that construes it is the acc.u.mulation of experimental knowledge. Adopting the idea that the best theories are those that survive the severest tests, and understanding a severe experimental test of a claim as one that the claim is likely to fail if it is false, the new experimentalists can show how experiment can bear on the comparison of radically different theories, and also how experiment can serve to trigger scientific revolutions. Careful attention to the details of experiments and to exactly what they do establish serves to keep theorising in check, and helps to distinguish between what has been substantiated by experiment and what is speculative.

There is no doubt that the new experimentalism has brought philosophy of science down to earth in a valuable way, and that it stands as a useful corrective to some of the excesses of the theory-dominated approach. However, I suggest it would be a mistake to regard it as the complete answer to our question about the character of science. Experiment is not so independent of theory as the emphasis of the previous sections of this chapter might suggest. The healthy and informative focus on the life of experiment should not blind us to the fact that theory has an important life too.

The new experimentalists are right to insist that to see every experiment as an attempt to answer a question posed by theory is a mistake that underestimates the extent to which experiment can have a life of its own. Galileo didn't have a theory about Jupiter's moons to test when he turned his telescope skywards, and, ever since then, many novel phenomena have been discovered by exploiting the opportunities opened up by new instruments or technologies. On the other hand, it does remain the case that theory often does guide experimental work and has pointed the way towards the discovery of novel phenomena. After all, it was a prediction of Einstein's theory of general relativity that motivated Eddington's eclipse expeditions and it was Einstein's extension of the kinetic theory of gases that led Perrin to investigate Brownian motion in the way that he did. In a similar vein, it was fundamental theoretical issues concerned with whether the rate of change of the polarisation of dielectric media should have magnetic effects like a conduction current that put Hertz onto the experimental path that culminated in the production of radio waves, and Arago's discovery of the bright spot at the centre of a disc's shadow resulted from a direct test of Fresnel's wave theory of light.

Whether theory can sometimes guide the experimentalist in the right direction or not, the new experimentalists are keen to capture a sense in which experimental knowledge can be vindicated in a way that is independent of high-level theory. Certainly Deborah Mayo has given a detailed and convincing account of how experimental results can be reliably established using an array of error-eliminating techniques and error statistics. However, as soon as the need arises to attach significance to experimental results that extends beyond the experimental situations in which they were produced, then reference to theory needs to be made.

Mayo endeavours to show how error statistics can be applied to carefully controlled experiments to yield the conclusion that experiments of that type yield specified results with a (specified) high degree of probability. Recorded experimental results are treated as a sample of all the possible results that might be achieved by experiments of that type, and error statistics can be applied to attribute probabilities to the population on the basis of the sample. A basic issue here is the question of what counts as an experiment of the same type. All experiments will differ from one another in some respects, insofar as, for example, they are conducted at different times, in different laboratories, using different instruments and so on. The general answer to the query is that the experiments must be similar in relevant respects. However, judgments about what counts as relevant are made by drawing on current knowledge, and so are subject to change when that knowledge is improved. Imagine, for example, Galileo conduc