III. Probability, Confirmation and Induction


 All induction files



Section 14: Introduction to Learning from Experience:
In Chapter I it was argued that generalisations should not be justified by induction but by confirmation and support; that probability theory is very helpful in setting up a theory of confirmation; and that there are a number of important and intuitive principles of confirmation which are always used and which can be proved with the help of probability theory. These principles were stated to be of four kinds and may be summarised as:

Confirmation: The probability of a theory increases as its consequences are verified.
Support: The probability of a theory increases as relevant circumstances are verified.
Competition: The probability of a theory increases as its competing theories are falsified.
Undermining: The probability of a theory decreases as its assumptions are falsified.

That we can prove these principles, and in considerable more detail than they are here stated shall be shown now, using the mathematical results of chapter II. These proofs are also mathematical, though I shall state the results both in verbal terms and in formulas. Therefore the fact that we have not yet decided what we mean by "the probability of a proposition" other than that it should satisfy the axioms and entailed mathematics of probability theory is not important yet. (Note 7)

Section 15: Confirmation: In the present chapter I shall use T for theories and P for their predictions, and I shall require that a prediction is logically entailed by the theory it is said to be a prediction of. As a first result we have, by T3

(15) No theory has a probability higher than its most improbable consequences.
       The formula is: If T |-P, then pr(T) <= pr(P).

This gives a means to gauge the probability of a theory by the probability of its consequences, and it shows that, however plausible a theory may sound, it is not probable if it entails an improbable proposition.

For the next result we use T8, which has it that the simultaneous truth of a theory and its predictions satisfy the identity pr(T)pr(P/T) = pr(P)pr(T/P). T8 is the central theorem in this chapter, as the reader shall see. It follows that if T |- P, T13 entails that pr(T)=pr(P)pr(T/P), which can be stated in words as:

(16) The probability of a theory is proportional to the probability of its predictions.
       The formula is: If T |- P, then pr(T)=pr(P)pr(T/P).

For as long as pr(T/P) remains constant (because P has not been verified or falsified), pr(T) varies with pr(P): If the latter grows, so does the former; and if pr(P) decreases, so does pr(T).

Now (16) is a sharper version of (15), and neither says anything about confirmation by itself. So suppose P is verified. It follows from the earlier pr(T) = pr(P)pr(T/P) that pr(T/P) =pr(T):pr(P), and since pr(P) < 1, it follows pr(T/P) > pr(T). In words:

(17) The probability of a theory increases as its consequences are verified, and does so proportionally to the improbability of its verified consequences.
       The formula is: If T |- P, pr(T/P)= pr(T):pr(P).

This shows how we can come to attach great confidence to a theory which we initially gave a low probability: If several times in succession an improbable prediction of a theory is verified, its probability rapidly increases. Or else the theory may entail many rather probable consequences, all of which are verified. In either case our confidence increases substantially: In the former case because of verifying a few improbable predictions; and in the latter case because of verifying many predictions each of which slightly increased the last result. Incidentally, (15) shows that the probability of a theory cannot increase beyond 1, and (16) and (17) together show that it is wise to concentrate on the least probable predictions of a theory.

Section 16: Undermining: Suppose we have a theory T that we derived from a more general theory G, i.e. G |- T. This also covers the case when T is a mere prediction. Now suppose that it happens we can falsify G, by the usual means of deriving a false consequence. What can we say about the probability of T, to which G served as a possible ground?

Well, by T12 it follows from G |-T that pr(G&~T)=0, and so by T5 we have pr(T) = pr(T&G)+pr(T&~G) i.e. pr(T&~G) = pr(T)-pr(T&G), which by the same T5 applied to pr(G) = pr(G&T)+pr(G&~T) = pr(G&T) (as pr(G&~T)=0) yields pr(T&~G) = pr(T)-pr(G). Dividing both sides by pr(~G)=1-pr(G) yields pr(T/~G) =((pr(T)-pr(G)):(1-pr(G)). In words:

(18) The probability of a proposition (theory or prediction) T is decreased if a possible ground G for it is falsified.
The formula is: If G |- T, then pr(T/~G) = (pr(T)-pr(G)):(1-pr(G)).

Of course, it follows from G |- T that pr(G) <= pr(T), so that the numerator on the RS cannot become negative. It also follows that the larger pr(G) was, the smaller pr(T/~G) will be. This is not immediately obvious, perhaps. It can be seen as follows: As pr(T) < 1, put pr(T) = 1-x, and put pr(G)=g. Then the RS of (18) translates as: ((1-x)-g):(1-g). This is the same as ((1-g)-x): (1-g), which in turn is the same as 1-(x:(1-g)). Now it can be seen that as as pr(G) is larger, 1-g = 1-pr(G) is smaller, and therefore x:(1-g) larger, and in result pr(T/~G) smaller, qed. Incidentally, this also shows that if pr(T)=1, so that x=0 (18) does not apply - but then also ALL possible grounds for T must have had a probability of 1, and then pr(T/~G) does not exist.

Section 17: Competition: A similar argument as the one I have just given applies in the case of two competing theories T1 and T2. If these are real competitors they cannot be both true, whence pr(T1&T2)=0. Therefore pr(T1)=pr(T1&~T2) by T5, which works out by way of T8 and T1 as pr(T1) = pr(~T2)pr(T1/~T2) = (1-pr(T2))pr(T1/~T2) i.e. pr(T1/~T2) = pr(T1):(1-pr(T2)). Clearly, the greater pr(T2) is, the smaller the denominator on the right, and so the larger pr(T1/~T2) and conversely, which is to say:

(19) The probability of a theory is increased if a competing theory is falsified, the more so the larger the probability of the competitor.
The formula is: If ~(T1&T2), then pr(T1/~T2) = pr(T1):(1-pr(T2)) = pr(T1):pr(-T2)

So if we falsify a theory with a high probability, the probability of its competitors are greatly increased, and if we falsify a theory with a small probability, the probabilities of its competitors are but little increased.

Section 18: Support: Finally, suppose that if T is true it does not follows that Q is true, but it does follow - deductively, and consequently with some probabilistic premise in the assumptions of T - that Q is more probable if T is true. Usually this comes about through deriving a prediction P from T, and knowing that, in actual fact, P and Q, depend to some extent on another. We shall say that, then, P and Q are relevant to each other, or also that they are (partial) conditions for each other.

This is to say the same as: T makes a positive difference to Q, which is to say that Q is more probable if T is true than if T is not true, or in symbols pr(Q/T) > pr(Q/~T). In chapter 2 we saw that this amounted to: T is positively relevant to Q. As above, we shall say that Q in such a case is a condition for T. It follows that pr(T/Q) > pr(T) i.e.

(20) The probability of a theory is increased if a positively relevant condition is  verified and the probability of a theory is decreased if a negatively relevant  condition is verified.
The formulas are: If QprelT, then pr(T/Q)>pr(T), and if QnrelT, then pr(T/Q) < pr(T). If QrelT, pr(T/Q) depends normally on pr(Q/~T): The greater this is, the more pr(T/Q) is altered.

In court, this is the case of circumstantial evidence: If a hypothesis T, when true, would make a certain circumstance (or condition, or - possible - fact) Q more probable, the verification of Q makes T more probable. It is just the same with negative circumstantial evidence: If verified it makes the hypothesis less probable. Of course, confirmation is a special case of support: When the support the hypothesis gives to the condition is maximal, because the hypothesis entails the condition as a consequence.

Now let's prove the last statement of (20), that pr(T/Q) depends normally on pr(Q/~T). What we have using only T1, Def 1 and T5 is

(21) pr(T/Q) = pr(T&Q):pr(Q) = (pr(T)pr(Q/T)):(pr(T)pr(Q/T)+(1-pr(T))pr(Q/~T))

Now if we suppose pr(T) and pr(Q/T) fixed, since together they amount to pr(T&Q), and as this is anyway natural until we learn more about them, it follows that pr(T/Q) depends solely pr(Q/~T), i.e. the probability of the circumstances on the denial of the hypothesis they partially depend on. This makes sense: If circumstance Q contributes to the probability of a theory T, the extent of its contribution depends on the probability of Q when T is not true. This is in its own right an important result because it shows we must always consider the denials of our hypotheses, since they are relevant. Indeed, we can see that the smaller pr(Q/~T) gets, the smaller becomes the second factor in the denominator, and so the closer does the whole fraction, and thus pr(T/Q) gets to 1.

It is interesting to note what is the case if TirrQ: Then pr(Q/T) = pr(Q/~T), and so it may be factored out in the denominator of the above expansion of pr(T/Q). Factored out it drops away against the same term in the numerator, and we get pr(T/Q) = pr(T):(pr(T)+(1-pr(T)) = pr(T). And this is just a new statement of the irrelevance we started with.

Section 19: Probability and induction: The foregoing paragraphs show

  • how we can increase the probabilities of our theories by verifying their consequences;

  • how we can do the same by undermining their competitors;

  • how our theories gain in probability if their competitors are refuted;

  • how our theories are supported by verifying positively relevant conditions and infirmed by verifying negatively relevant conditions; and

  • how we can attribute probabilities to theories.

Since all this applies to any sort of consistent theory, including generalisations, it gives us a way of solving the old problem of induction:

We need not justify generalisations by an inductive principle as (1) in Chapter I, for we can justify them by confirmation etc. and this is done by arguments that are deductively valid, provided one assumes probability theory and adopts an interpretation of probability that is adequate. To find such an interpretation will be a problem for the next chapter, but otherwise the above provides a complete solution to the old problem of induction.

It is this:

1. The answer to old problem of induction is that we do not seek to prove generalisations for indeed as shown in Chapter 1 such attempted proofs are deductively invalid and beg the question they attempt to prove
2. Instead we assume the generalisations we believe we need to explain facts in our experience, and show deductiively these assumptions do entail the facts they are supposed to explain
3. We proceed to deduce further consequences from these assumptions and use these to test our theories, namely by supporting or refuting them.

This is a neat answer, and it could be given only with the rise of probability, and was first seen, it seems, by Bayes, in the 18th Century, who saw how theories could be supported in probability-theory by using the probabilistic theorem pr(T|P)>pr(P) if pr(P|T)>pr(P).

This is deductively valid in probability theory, as shown in Chapter 2, unlike the corresponding principle of deductive logic "From if T then P and P infer T", which is known as the fallacy of confirming the consequence.

There remains a prollem, for the new problem of induction now arises in a probabilistic form:

Suppose once more that T |- P. It follows that pr(T|P) >= pr(P), and so it seems we can use confirmation. But now suppose there is a matter of fact Q such that P&Q. Now pr(T/P&Q) = pr(T/P), so that we can abstract from and disregard Q, or not?

But by what rational argument can we prove that for an arbitrary Q? That is, by what rational argument can we prove that for arbitrary rational theories T and their otherwise arbitrary predictions P, which are verified or falsified in the context of some arbitrary circumstance Q, that TirrQ/P?

Before answering this question, which is the probabilistic version of the new problem of induction, we first have to find a clear interpretation of probability; and to have a better idea what theories are.




 All induction files




 Colofon: Written in 1980-83, lectured about in 1989, but not previously published.

[email protected]