II. The mathematics of probability.


 All induction files



Section 10: Introduction to probability
: There are a number of different axiomatizations of probability, and also at least three different interpretations of any given axiomatization. Before considering the different interpretations I shall give what has become in this century the standard mathematical axiomatization of probability theory, namely Kolmogorov's. For the finite case Kolmogorov proposed three axioms, which may be stated as follows:

Suppose that $ is a set of propositions P, Q, R etc. and that this set is closed for negation, conjunction and disjunction, which is to say that whenever (P e $) and (Q e $), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a function that maps the propositions in $ into the real numbers in the following way, that is, satisfying the following three axioms:

A1. For all P e $ the probability of P, written as pr(P), is some non-negative real number.
A2. If P is logically valid, pr(P)=1.
A3. If ~(P&Q) is logically valid, pr(PVQ)=pr(P)+pr(Q).

In the present formulation I have chosen to attribute probabilities to propositions. This is not necessary, for probabilities may be attributed to sets as well. Since the mathematics is the same, and propositions are sets (of letters, words or of ideas, as one prefers), and since the present reading is more convenient, I have chosen to say that probabilities apply to propositions. There is more to be said on this choice but as this properly belongs to the interpretations of probability, I shall return there to this issue.

Section 11: Basic theorems: Irrespective of the axiomatization or interpretation of probability, there are a number of important theorems which we shall need - just as we need laws like (a+b)=(b+a) for counting, irrespective of axioms used to prove them or of what we choose to count. The advantage and use of axioms is that one can use them to prove the theorems one needs - and having given a valid proof one knows that any objection against the theorem must be directed against the axioms, for the theorem was proved to follow from them. So what we shall do first is to derive some useful theorems. First, then, there is

T1. pr(~P)=1-pr(P).

This shows how we can find the probability of ~P from pr(P). T1 is proved by noting that pr(PV~P)=pr(P)+pr(~P) by A3, since ~(P&~P) is logically valid, and also pr(PV~P)=1 by A2, since (PV~P) is logically valid. It should be noted that here and in the rest of the chapter I merely indicate the proofs, so that the reader can do the rest. Next there is

T2. 0 <= pr(P) <= 1.

which says that all probabilities are in the interval of 0 to 1 inclusive. That pr(~P) is not less than 0 follows from A1. Now if pr(P) would exceed 1, pr(~P) would be less than 0 by T1, which is a contradiction. So it follows pr(P) does not exceed 1, and T2 now follows by A1.

T3. If P |= Q, then pr(P) <= pr (Q).

This says that if P logically entails Q, then pr(P) is not larger than pr(Q). It can be proved by noting that if P indeed does logically entail Q, that then ~(P&~Q), and so A3 entails pr(PV~Q) =pr(P)+pr(~Q). By A1 and T2 the LS is <= 1, and so pr(P)+pr(~Q) <= 1, from which follows the theorem on transposing 1-pr(Q). T3 immediately entails

T4. If P is logically equivalent to Q, then pr(P)=pr(Q),

as logical equivalence amounts to: P |= Q and Q |= P. So logical equivalents have the same probability. This is a very important theorem, and is used all the time. Thus we can use it to prove the following expansion of any proposition P:

T5. pr(P)=pr(P&Q) + pr(P&~Q), for arbitrary P and Q in $.

It is noteworthy that this expansion can be repeated: pr(P & Q) + pr(P&~Q) = pr(P&Q&R)+ pr(P&Q&~R)+pr(P&~Q&R)+pr(P&~Q&~R), and repeated ad lib.

T5 is proved by noting that T4 entails that pr(P)=pr((P&Q)V(P&~Q)), which turns into T5 upon noting that ~((P&Q)&(P&~Q)) is logically valid, and applying A3. From T5 we can conveniently prove

T6. pr(PVQ)=pr(P)+pr(Q)-pr(P&Q)

which extends A3, and shows how we may calculate the probability of any disjunction. To prove this, we first note that by T4 pr(PVQ) = pr((P&Q) V (P&~Q) V (~P&Q)) = pr(P&Q) + pr(P&~Q) + pr(~P&Q) by A3. Since by T5 pr(Q) = pr(P&Q) + pr(~P&Q), it follows that adding pr(P)+pr(Q) and subtracting once their common term pr(P&Q) yields pr(PVQ) i.e. T6. If we now combine T5 and T6 we get

T7. pr(P&Q) <= pr(P) <= pr(PVQ)

that is: The probability of a conjunction is not larger than the probability if any of its conjuncts, and the probability of a disjunction is not smaller than the probability of any of its disjuncts.

Section 12: Basic conditional theorems: Most probabilities are not, as they were in this chapter so far, absolute, but are conditional: Rather than saying "the probability of Q = x" we usually introduce a condition and say, "the probability of Q, if P is true, = y". This idea, that of the probability of a proposition Q given that one or more propositions P1, P2 etc. are true is formalised by the following important definition:

Definition 1 : pr(Q/P) = pr(P&Q):pr(P)

That is: The conditional probability of Q, given or assumed that P is true, equals the probability that (P&Q) is true, divided by the probability that (P) is true. NB, as this fact has important implications for the interpretation and application of probability theory: A conditional probability is defined in terms of absolute probabilities, so therefore we need absolute probabilities to establish conditional ones.

Definition 1 has many applications, and many of these turn on the fact that it also provides an implicit definition of pr(P&Q), namely as pr(P)pr(Q/P) (simply by multiplying both sides of Def 1 by pr(P)). Consequently, we have as a theorem (if pr(P)>0 and pr(Q)>0)

T8. pr(P&Q)=pr(P)pr(Q/P)=pr(Q)pr(P/Q)

The second equality is, of course, also an application of Def 1, and T8 accordingly says that the probability of a conjunction equals the probability of one conjunct time the probability of the other given that the one is true. Another consequence of Def 1 is

T9. pr(Q/P)+pr(~Q/P)=1

which results from T5 and Def 1 upon division by pr(P), and says that the probability of Q if P plus the probability of ~Q if P equals 1. Of course, this admits of a statement like T1:

T10. pr(Q/P)=1-pr(~Q/P)

which shows that conditional probabilities are like unconditional ones. A theorem to the same effect, that parallels T3 is

T11. 0 <= pr(Q/P) <= 1.

That 0 <= pr(Q/P) follows from D1, because the components of a conditional are both >=0 by A1; and that pr(Q/P)<=1 is equivalent to pr(P&Q) <= pr(P), which holds by T7. A theorem in the vein of T4 is

T12. If P |= Q, then pr(P&~Q)=0

This is proved by noting that if P |= Q holds, then so does ~(P&~Q), which, byA3, entails that pr(PV~Q)=pr(P)+pr(~Q). As by T6 pr(PV~Q)=pr(P)+pr(~Q)-pr(P&~Q), it follows pr(P&~Q)=0 if P |= Q. From this it easily follows that

T13. If P |= Q, then pr (Q/P)=1 provided pr(P)>0

which is to say that if Q is a logical consequence of P, the probability of Q is P is true is 1. The proviso is interesting, for it denies the possibility of inferring Q from a logical contradiction or known falsehood. This means that the def: P |= Q =df pr(Q/P)=1 strengthens the logical "|=" by adding that proviso. T13 immediately follows from T5, T12 and Def 1.

Def 1 may, of course, list any finite number of premises, as in pr(Q/P1&....&Pn) = pr(Q&P1&....&Pn):pr(P1&....&Pn). Such long conjunctions admit of a theorem like T8:

T14. pr(P1&.....&Pn)=
pr(P1)pr(P2/P1)pr(P3/P1&P2) .........pr(Pn/P1&.....&Pn)

This says that the probability that n propositions are true equals the probability that the first (in any convenient order) is true times the probability that the second is true if the first is true times the probability that the third is true if the first and the second are true etc. The pattern of proof can be seen by noting that for n=3 pr(P1)pr(P2/P1)pr(P3/P1&P2) = pr(P1&P2)pr(P3/P1&P2) = pr(P3&P2&P1) because the denominators successively drop out by Def 1. That the premises can be taken in any order is a consequence from T4: Conjuncts taken in any order are equivalent to the same conjuncts in any other order.

T11 and T13, together with T9 and T10, show that conditional probabilities are probabilities. We need just one further theorem:

T15. If R |= ~(P&Q), then pr(PVQ/R) = pr(P/R)+pr(Q/R)

which parallels A3. It is easily proved by noting that pr(PVQ/R) = (pr(P&R)+pr(Q&R)-pr(P&Q&R)):pr(R) by Def 1, T4 and T6, and that pr(P&Q&R)=0 by T12 and T4 on the hypothesis. The conclusion then follows by Def 1.

Section 13: Irrelevance: A second important concept which now can be defined is that of irrelevance. Two propositions P and Q are said to be - probabilistically - irrelevant, abbreviated PirrQ if the following is true:

Def 2: PirrQ iff pr(P&Q)=pr(P)pr(Q)

Evidently, irrelevance is symmetric:

T16. PirrQ iff QirrP

But there are more interesting results. Let's call a logically valid statement a tautology and a logically false statement a contradiction. Then we can say:

T17. Any proposition is irrelevant to any tautology and to any contradiction.

Note that this entails that tautologies are also mutually irrelevant. To prove T17, first suppose that P is tautology. By A2 pr(P)=1. Since tautologies are logically entailed by any proposition, Q |= P, and so pr(Q&~P)=0 by T12. Consequently, it follows pr(Q)=pr(Q&P) by T5, and so pr(P).pr(Q)=1.pr(Q&P)= pr(P&Q) and we have irrelevance. Next, suppose (P) is a contradiction. If so, ~(P) is a tautology, and so pr(P)=0 by T1. By T7 pr(P&Q) <= pr(P) and as by A1 all probabilities are >= 0, it follows pr(P&Q)=0. But then pr(P)pr(Q)=0.pr(Q)= 0=pr(P&Q), and again we have irrelevance.

Def 2 is often stated in two other forms, which are both slightly less general, as they require respectively that pr(P)>0 or that pr(P)>0 and pr(~P)>0, in both cases to prevent division by 0. Both alternative definitions depend on Def 1, and the first is given by

T18. If pr(P)>0, then PirrQ iff pr(Q/P)=pr(Q).

This is an immediate consequence of Defs 1 and 2. It states clearly the important property that irrelevance3 signifies: If P is irrelevant of Q, the fact that P is true does not alter anything about the probability that Q is true - and conversely, by T16, supposing that Q is not also a contradiction. So irrelevance of one proposition to another is always mutual, and means that the truth of the one makes no difference to the probability of the truth of the other.

This can again be stated in yet another form, with once again a slightly strengthened premise, for now it is required that both pr(P) and pr(~P) are > 0:

T19. If 0 < pr(P) < 1, then PirrQ iff pr(Q/P)=pr(Q/~P)

Suppose the hypothesis, which may be taken as meaning that P is an empirical proposition, is true.T19 may be now proved by noting the following: pr(Q/P)=pr(Q/~P) iff pr(Q&P):pr(P) = pr(Q&~P): (1-pr(P)) iff pr(Q&P) - pr(P)pr(Q&P) = pr(P)pr(Q&~P) iff pr(Q&P) = pr(P)(pr(Q&P)+pr(Q&~P)) iff pr(Q&P) = pr(P)pr(Q).

Another important property of irrelevance is that if P and Q are irrelevant, then so are their denials:

T20. PirrQ iff (~P)irrQ iff Pirr(~Q) iff (~P)irr(~Q).

This too can be proved by noting some series of equivalences that yield irrelevance. First consider pr(P&~Q), assuming PirrQ. Then pr(P&~Q) = pr(P)-pr(P&Q) = pr(P)-pr(P)pr(Q) = pr(P)(1-pr(Q))= pr(P)pr(~Q). So Pirr(~Q) if PirrQ. The converse can be proved by running the argument in reverse order, and so Pirr Q iff Pirr(~Q). The other equivalences are proved similarly.

Finally, the concept of irrelevance, which so far has been used in an unconditional form, may be given a conditional form, when we want to say that P and Q are irrelevant if T is true:

Def 3: PirrQ/T iff pr(Q/T&P) = pr(Q/T)

This says that the probability that Q is true if T is true is just the same as when T and P are both true - i.e. P's truth makes no difference to Q's probability, if T is true. It should be noted that Def 3 requires that pr(T&P) > 0 (which makes pr(T) > 0), but that on this condition T19 shows that Def 3 is just a simple extension of Def 2. And as with Def 2 there is symmetry:

T21. PirrQ/T iff QirrP/T.

For suppose PirrQ/T. By Def 3 pr(Q/T&P)=pr(Q/T) iff pr(Q&T&P):pr(T&P)=pr(Q&T):pr(T) by Def 1. This is so iff pr(Q&T&P):pr(Q&T) = pr(T&P):pr(T) iff pr(P/Q&T)=pr(P/T) iff QirrP/T by Def 3.

And this conditional irrelevance of Q from P if T does not only hold in case P is true, but also in case P is false. That is:

T22. PirrQ/T iff (~P)irrQ/T.

For suppose PirrQ/T, i.e. pr(Q/T&P) = pr(Q/T). By def 1 this is equivalent to pr(Q&T&P):pr(T&P) = pr(Q&T):pr(T) iff pr(Q&T&P) = pr(T&P)pr(Q&T):pr(T). Now pr(Q&T&P) = pr(Q&T)-pr(Q&T&~P), and so we obtain the equivalent pr(Q&T&~P) = pr(Q&T)-(pr(T&P)pr(Q&T):pr(T)) = pr(Q&T)(1-(pr(T&P):pr(T)) = pr(Q&T)((pr(T)-pr(T&P)) : pr(T)) = pr(Q&T)(pr(T&~P):pr(T)) from which we finally obtain as equivalent to PirrQ/T pr(Q&T&~P):pr(T&~P) = pr(Q&T) : pr(T), which is by Def 3 the same as (~P)irrQ/T. Qed.

And finally T21 and T22 yield the same result for conditional irrelevance as for irrelevance:

T23. PirrQ/T iff QirrP/T
                    iff (~P)irrQ/T
                    iff Pirr(~Q)/T
                    iff (~P)irr(~Q)/T

The proof is: The first line is T21, the second T22. The third results thus: By both theorems, QirrP/T iff (~P)irrQ/T whence PirrQ/T iff (~P)irrQ/T by T21. The fourth results from this by substituting (~Q) for Q. Qed.

So far as regards the mathematics of probability for the moment. Now let's apply what we have established. 




 All induction files




 Colofon: Written in 1980-83, lectured about in 1989, but not previously published.

[email protected]