Classical Probability Theory
and Learning by Experience

 

To: Logic pages


 

Introduction  

In this page, I shall present a version of the standard classical probability theory that I shall call CPT. Just as in case of CPL in chapter 2 it is based on standard elementary algebra and English, and has all the standard theorems of standard probability theory - on a slightly different new basis.

Here is an outline and preview with links:

Sections

1. A new set of axioms for probability (*)
2. Kolmogorov's standard axioms for probability are derived
3. Over 20 standard theorems of probability theory:
    A. Basic unconditional theorems
    B. Basic conditional theorems
    C. Basic theorems about irrelevance
4. How probability theory explains learning from experience
 
       Confirmation
       Undermining
       Competition
       Support
5.
The utility of probability theory

You can take the first for granted and go straight to the second or third, while the fourth is the most interesting and involves some fundamental applications of the basic ideas of probability theory to reasoning and to learning from experience. Something much like this and much more is in G. Polya's two volumes on Plausible Reasoning.

In particular, in section 4 it will be shown that here are a number of important and intuitive principles of confirmation which are always used by people reasoning about matters of facy and which can be proved with the help of probability theory.

These principles are stated to be of four kinds and may be summarised as:

Confirmation:

The probability of a theory increases as its consequences are verified.

Support:

The probability of a theory increases as relevant circumstances are verified.

Competition:

The probability of a theory increases as its competing theories are falsified.

Undermining:

The probability of a theory decreases as its assumptions are falsified


These principles are then proved on the basis of what was established in the earlier sections.

 

All reasoning and mathematics in what follows is elementary but some knowledge of propositional logic is presupposed if not strictly necessary, since most formulas are (initially) given English readings.


1. New axioms for probability theory  


What I shall provide is a set of three new axioms that imply the standard axioms for probability of Kolmogorov. These new axioms make it easier to join probability theory to propositional logic than Kolmogorov's axioms and are more elementary and simpler than his in several respects - as shall be shown.
(*)

Here are the axioms, where all that is assumed about "pr(A)" is that it is equal to some number and read as "the probability of A". This means that one must add synactical rules to the effect


Notation:

    "pr(A)" is "the probability of A"

    "|-TA" is "A is a theorem of theory T"


CPT-Syntaxis

     As for CPL plus:

  • CPT-pr() : If "A" is a proposition of CPL and "a" any number between 0 and 1 inclusive,
                   "pr(A)=a" is proposition of CPT.
  • CPT-|-T : If "A" is a proposition of CPL and "T" a name for a set of statements of CPL,
                  "|-T A" is a proposition of CPT.

This means that CPT is syntactically an extension of CPL: "pr(A)" refines [A] in that (as we shall prove) 0 <= pr(A) <= 1.

The notation "|-T" is introduced to facilitate the link to Kolmogorov's statement and to have a convenient abbreviation for "A is a theorem of theory T". Introducing it is not necessary, for "[A]=1 holds in theory T" or "pr(A)=1 holds in theory T" are taken to mean the same. Also it is noteworthy that it does not follow that one can iterate either "|-T" as in "|-T(|-TA)" or iterate "pr()" as in "pr(pr(A)=a)=b".

Now the semantical axioms for CPT are:



 

If A and B are any propositions in CPL:

 

Alternatively expressed:

 

AxA.

(|- A) --> pr(A)=1 

[A]=1 --> pr(A)=1

AxB.

((|- A --> B) --> pr(A)<=pr(B)

[A-->B]=1 --> pr(A) <= pr(B)

AxC.

pr(A)=pr(A&B)+pr(A&~B)

pr(A)=pr(A&B)+pr(A&~B)


Here "|- A" formalizes the notion that "A is a theorem in the presumed theory", where "a theory" is "a set of assumptions added to the axioms of logic" and the "theorems of the theory" are all statements that can be deduced from the theory by inference rules of a presumed logic, such as CPL.

Note that in what follows the reference to a theory T is abstracted from (though in any application this will be what one will want to find logical consequences from), and that therefore while [A]=1 iff |- A is useful, the notation "|- AT" makes reference to an item "[A]=1" doesn't (though it could be easily added).

Also, it is noteworthy that mere factual truth of A is not sufficient to make the hypotheses of AxA and AxB true: Indeed, what one normally wants is an assurance (and so a proof) that a given theory T does logically imply or fail to imply a certain proposition P - after which one has an external check on theory T, by finding whether the proposition P is in fact true or false.

I abstract from reference to theories to simplify and eliminate clutter, but it is useful to state a version with such references and provide readings, i.a. because this shows how neatly the axioms tie PT to PL in the present formulations


 

If A and B are any propositions in CPL:

Alternatively expressed:

AxA.

(|-T A) --> pr(A|T)=1 

[AT]=1 --> pr(A|T)=1

AxB.

((|-T A --> B) --> pr(A|T)<=pr(B|T)

[AT-->BT]=1 --> pr(A|T) <= pr(B|T)

AxC.

pr(A|T)=pr(A&B|T)+pr(A&~B|T)

pr(A|T)=pr(A&B|T)+pr(A&~B|T)

Here is a reading with the various optional reference to a supposed theory T (a sequence of statements of CPL) left ou


 

CPT-axioms in words:

AxA:

A is a theorem only if the probability of A is 1.

AxB:

(A only if B) is a theorem only if the probability of A is less than or equal to  the probability of B.

AxC:

The probability of A is the sum of the probabilities of A and B and A and not B.

It is from the formal statement of these axioms - dropping references to T - we shall now derive Kolmogorov's axioms - which also do not explicitly refer to a theory that may be used in the hypotheses of its axioms.


2. The proof of the standard Kolmogorov axioms for probability theory:

These standard Kolmogorov axioms for probability are normally stated in such terms as:

 

Kolmogorov axioms for probability theory:

 

Suppose that $ is a set of propositions P, Q, R etc. and that this set is closed for negation, conjunction and disjunction, which is to say that whenever (P e $) and (Q e $), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a function that maps the propositions in $ into the real numbers in the following way, that is, satisfying the following three axioms:

 

 

A1.

For all P e $ the probability of P, written as pr(P), is some non-negative real number.

A2.

If P is logically valid, pr(P)=1.

A3.

If ~(P&Q) is logically valid, pr(PVQ)=pr(P)+pr(Q).


In fact, we don't need the initial statement, since we simpy presume CPL, which does meet the specifications of the initial statement. What we do need is proofs of A1, A2 and A3. Here they come.

First, there is the fundamental theorem that permits inferences from logical equivalences to probabilities:


T*1:

|- (A iff B) --> pr(A)=pr(B)

Equivalent propositions have the same probability

(1)  

|- (A iff B) --> |- (A --> B) --> pr(A) <= pr(B) 

 AxB

(2)

|- (A iff B) --> |- (B --> A) --> pr(B) <= pr(A)

 AxB

(3)

|- (A iff B) --> pr(A) <= pr(B) & pr(B) <= pr(A)

 (1), (2)

(4)

|- (A iff B) --> pr(A) = pr(B)       

 (3), Algebra

Next, it is proved contradictions have probability 0:

T*2

pr(A&~A)=0

Contradictory propositions have zero probability

(1)

pr(A)=pr(A&A)+pr(A&~A) 

AxC

(2)

pr(A)=pr(A&A)

T*1 with (|- A iff (A&A))

(3)

       =pr(A)+pr(A&~A)

(1), (2)

(4)

pr(A&~A)=0

(3), Algebra


It is often helpful to have in propositional logic two special constants, such as Taut (from "tauology") and Contrad (from "contradiction"). These are defined as: Taut iff AV~A and Contrad iff A&~A. Taking this for granted:

T*3

0 <= pr(A) <= 1

Probabilities are between 0 and 1 inclusive

(1)

|- A --> pr(A)=1 

AxA

(2)

|- pr(Taut)=1 

(1) and |- Taut

(3)

|- A --> Taut

Logic

(4)

pr(A) <= pr(Taut)

(3), AxB

(5)

pr(A) <= 1

(2), (4)

(6)

 pr(Contrad)=0

T2

(7)

|- (Contrad --> A)  

Logic

(8)

pr(Contrad) <= pr(A) 

(7), AxB

(9)

0 <= pr(A)   

(6), (8)

(10)

0 <= pr(A) <= 1

(5), (9)

Next, we need to prove the probabilistic theorem for denial. We do it in two steps:

T*4

 pr(AV~A)=pr(A)+pr(~A)

Probability of disjunction of exclusives is sum of probability of factors

(1)

pr(AV~A)=pr((AV~A)&A)+pr((AV~A)&~A)

AxC

(2)

pr(A)=pr((AV~A)&A)

T*1, as |- ((AV~A)&A) iff AT*1, as |- ((AV~A)&~A) iff ~A

(3)

pr(~A) = pr((AV~A)&~A) 

 

(4)

pr(AV~A) = pr(A)+pr(~A

(1),(2),(3)

 

T*5

pr(~A)=1-pr(A)

Probability of denial is complementary probability

(1)

pr(AV~A)=pr(A)+pr(~A)

T*4

(2)

1=pr(A)+pr(~A)

AxA since |- AV~A

(3)

pr(~A) = 1-pr(A) 

(2), Algebra


Next, we have this parallel to AxA:

T*6

|-~A --> pr(A)=0

Provable non-truths have zero probability

(1)

|-~A 

Assumption

(2)

pr(~A)=1

(1), AxA

(3)

1-pr(A)=1

(2), T*5

(4)

pr(A)=0

(3), Algebra

The main point of T*6 and AxA is that if one can prove that A (or ~A), then thereby it follows that pr(A)=1 (or pr(A)=0 if |-~A). This is normally important in comparing the supposed truths and non-truths one can logically infer from a theory, with what the facts are (so that if one can prove that |-T A, while in fact one finds ~A one thereby has learned the assumptions of theory T can't be all true, if the proof of |-T A was without mistakes in reasoning. Incidentally, this shows one should not define "|-T A" as "Nec A", with "Nec" the modality of necessary truth: This amounts to the presumption that T is true.)

Next, we need a theorem that serves as a lemma to the next theorem, but that needs a remark itself. The theorem is:

T*7

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

Full disjunctive probabilistic sum of two factors

(1)

pr(A)+pr(~A)=1  

T*5

(2)

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

(1), AxC

The promised remark is that T*7 differs essentially from the similar theorem in CPL minus the probabilities: In CPL ([A&B]+[A&~B]+[~A&B]+[~A&~B]) is true and implies that precisely one of the four factors is true. In PT pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B) is true, but normally none of the four alternatives itself is provably true by itself; normally none of the four alternatives are true; and normally several or all of the alternatives will have a probability between 0 and 1 (conforming T*3).

Indeed, a very interesting aspect of PT is that it assigns numerical measures to all alternatives the underlying logic can distinguish, regardless of whether these alternatives are true or have ever been true. And part of the interest is that there normally are far more logically possible alternatives than logically provable alternatives.

To finish the proof CPT indeed implies all of Kolmogorov's axioms for PT we need to derive his Ax3:

T*8

|-~(A&B) --> pr(AVB)=pr(A)+pr(B)

Conditional sums

(1)

|-~(A&B)  

AI

(2)

pr(A&B)=0

T*6

(3)

pr(A)=pr(A&~B)

(2), AxC

(4)

pr(B)=pr(~A&B)

(2), AxC, T*1

(5)

pr(AVB)=1-pr(~A&~B)

T5, T*1 with |-(~(~A&~B)) iff (AVB)

(6)

          =pr(A&B)+pr(A&~B)+pr(~A&B)

T*7

(7)

          =pr(A&~B)+pr(~A&B)

(2),(6)

 (8)

          =pr(A)+pr(B)

(3),(4),(7)


I have now proved all of Kolmogorov's axioms for the finite case: A1 follows from T*3; A2 is AxA; and A3 is T*8.


3. Some fundamental theorems of CPT

Irrespective of the axiomatization or interpretation of probability, there are a number of important theorems which we shall need - just as we need laws like (a+b)=(b+a) for counting, irrespective of axioms used to prove them or of what we choose to count. The advantage and use of axioms is that one can use them to prove the theorems one needs - and having given a valid proof one knows that any objection against the theorem must be directed against the axioms, for the theorem was proved to follow from them. So what we shall do first is to derive some useful theorems.

A. Basic unconditional theorems

First, then, there is a group of theorems that the reader may derive from Kolmogorov's axioms (from which they do follow) and that I derived above from my axioms:


T1

pr(~P)=1-pr(P)

  T*5

T2

0 <= pr(P) <= 1

  T*3

T3

If P |- Q, then pr(P) <= pr (Q)

  AxB

T4

If P is logically equivalent to Q, then pr(P)=pr(Q)

  T*1

T5

pr(P)=pr(P&Q) + pr(P&~Q)

  AxC

T6

pr(PVQ)=pr(P)+pr(Q)-pr(P&Q)

  T*7, T5


These were all proved in section (4.1). We only add

T7

pr(P&Q) <= pr(P) <= pr(PVQ)

 


that is: The probability of a conjunction is not larger than the probability of any of its conjuncts, and the probability of a disjunction is not smaller than the probability of any of its disjuncts. It follows from T5 and T6 or from AxB and logic.

In what follows I'll state and prove the most important theorems of elementary finite probability theory, firstly because I have never seen this done properly in one paper, secondly because it seems to me one of the cornerstones of human reasoning, and thirdly to be able to show how we can learn from experience using probability theory. (The last subject starts in section 4.6. It deserves to be better known than it is, for it could help to defuse, refute or ridicule much improbable nonsense that people believe in.)

In what follows proofs when referring to axioms refer to Kolmogorov's. Readers thoroughly familiar with elementary probability theory may choose to skip the rest of this chapter, but are advised to read the last sections, 4.11 and 4.12.


B. Basic conditional theorems  

Most probabilities are not, as they were in this chapter so far, absolute, but are conditional: Rather than saying "the probability of Q = x" we usually introduce a condition and say, "the probability of Q, if P is true, = y". This idea, that of the probability of a proposition Q given that one or more propositions P1, P2 etc. are true is formalised by the following important definition:

Definition 1

pr(Q|P) = pr(P&Q):pr(P)

 

That is: The conditional probability of Q, given or assumed that P is true, equals the probability that (P&Q) is true, divided by the probability that (P) is true. NB, as this fact has important implications for the interpretation and application of probability theory: A conditional probability is defined in terms of absolute probabilities, so therefore we need absolute probabilities to establish conditional ones.

Definition 1 has many applications, and many of these turn on the fact that it also provides an implicit definition of pr(P&Q), namely as pr(P)pr(Q|P) (simply by multiplying both sides of Def 1 by pr(P)). Consequently, we have as a theorem (if pr(P)>0 and pr(Q)>0)


T8

pr(P&Q)=pr(P)pr(Q|P)=pr(Q)pr(P|Q)

 

The second equality is, of course, also an application of Def 1, and T8 accordingly says that the probability of a conjunction equals the probability of one conjunct time the probability of the other given that the one is true. Another consequence of Def 1 i

T9

pr(Q|P)+pr(~Q|P)=1

 

which results from T5 and Def 1 upon division by pr(P), and says that the probability of Q if P plus the probability of ~Q if P equals 1. Of course, this admits of a statement like T1:

T10

pr(Q|P)=1-pr(~Q|P)

 

which shows that conditional probabilities are like unconditional ones. A theorem to the same effect, that parallels T3 is

T11

0 <= pr(Q|P) <= 1

 

That 0 <= pr(Q|P) follows from D1, because the components of a conditional are both >=0 by A1; and that pr(Q|P)<=1 is equivalent to pr(P&Q) <= pr(P), which holds by T7. A theorem in the vein of T4 is


T12

 If P |- Q, then pr(P&~Q)=0

 

This is proved by noting that if P |- Q holds, then so does ~(P&~Q), which, by A3, entails that pr(PV~Q)=pr(P)+pr(~Q). As by T6 pr(PV~Q)=pr(P)+pr(~Q)-pr(P&~Q), it follows pr(P&~Q)=0 if P |- Q. From this it easily follows that

T13

If P |- Q, then pr(Q|P)=1 provided pr(P)>0

 

which is to say that if Q is a logical consequence of P, the probability of Q is P is true is 1. The proviso is interesting, for it denies the possibility of inferring Q from a logical contradiction or known falsehood. This means that the def: P |- Q =df pr(Q|P)=1 strengthens the logical "|-" by adding that proviso. T13 immediately follows from T5, T12 and Def 1.

Def 1 may, of course, list any finite number of premises, as in pr(Q|P1&....&Pn) = pr(Q&P1&....&Pn): pr(P1&....&Pn). Such long conjunctions admit of a theorem like T8:

T14

pr(P1&.....&Pn)=pr(P1)pr(P2|P1)pr(P3|P1&P2).....pr(Pn|P1&.....&Pn)

 

This says that the probability that n propositions are true equals the probability that the first (in any convenient order) is true times the probability that the second is true if the first is true times the probability that the third is true if the first and the second are true etc. The pattern of proof can be seen by noting that for n=3 pr(P1)pr(P2|P1)pr(P3|P1&P2) = pr(P1&P2)pr(P3|P1&P2) = pr(P3&P2&P1) because the denominators successively drop out by Def 1. That the premises can be taken in any order is a consequence from T4: Conjuncts taken in any order are equivalent to the same conjuncts in any other order.

T11 and T13, together with T9 and T10, show that conditional probabilities are probabilities we need just one further theorem:


T15

If R |- ~(P&Q), then pr(PVQ|R) = pr(P|R)+pr(Q|R)

 

which parallels A3. It is easily proved by noting that pr(PVQ|R) = (pr(P&R)+pr(Q&R)-pr(P&Q&R)):pr(R) by Def 1, T4 and T6, and that pr(P&Q&R)=0 by T12 and T4 on the hypothesis. The conclusion then follows by Def 1.


C. Basic theorems about irrelevance

A second important concept which now can be defined is that of irrelevance. Two propositions P and Q are said to be - probabilistically - irrelevant, abbreviated PirrQ if the following is true:

Def 2

PirrQ iff pr(P&Q)=pr(P)pr(Q)

 

Evidently, irrelevance is symmetric:

T16

PirrQ iff QirrP

 

But there are more interesting results. Let's call a logically valid statement a tautology and a logically false statement a contradiction. Then we can say:

T17

Any proposition is irrelevant to any tautology and to any contradiction.

 

Note that this entails that tautologies are also mutually irrelevant. To prove T17, first suppose that P is tautology. By A2 pr(P)=1. Since tautologies are logically entailed by any proposition, Q |- P, and so pr(Q&~P)=0 by T12. Consequently, it follows pr(Q)=pr(Q&P) by T5, and so pr(P).pr(Q)=1.pr(Q&P)= pr(P&Q) and we have irrelevance. Next, suppose (P) is a contradiction. If so, ~(P) is a tautology, and so pr(P)=0 by T1. By T7 pr(P&Q) <= pr(P) and as by A1 all probabilities are >= 0, it follows pr(P&Q)=0. But then pr(P)pr(Q)=0.pr(Q)= 0=pr(P&Q), and again we have irrelevance.

Def 2 is often stated in two other forms, which are both slightly less general, as they require respectively that pr(P)>0 or that pr(P)>0 and pr(~P)>0, in both cases to prevent division by 0. Both alternative definitions depend on Def 1, and the first is given by

T18

If pr(P)>0, then PirrQ iff pr(Q|P)=pr(Q)

 

  This is an immediate consequence of Defs 1 and 2. It states clearly the important property that irrelevance3 signifies: If P is irrelevant of Q, the fact that P is true does not alter anything about the probability that Q is true - and conversely, by T16, supposing that Q is not also a contradiction. So irrelevance of one proposition to another is always mutual, and means that the truth of the one makes no difference to the probability of the truth of the other.

This can again be stated in yet another form, with once again a slightly strengthened premise, for now it is required that both pr(P) and pr(~P) are > 0:

T19

If 0 < pr(P) < 1, then PirrQ iff pr(Q|P)=pr(Q|~P)

 

Suppose the hypothesis, which may be taken as meaning that P is an empirical proposition, is true.T19 may be now proved by noting the following: pr(Q|P)=pr(Q|~P) iff pr(Q&P):pr(P) = pr(Q&~P): (1-pr(P)) iff pr(Q&P) - pr(P)pr(Q&P) = pr(P)pr(Q&~P) iff pr(Q&P) = pr(P)(pr(Q&P)+pr(Q&~P)) iff pr(Q&P) = pr(P)pr(Q).

Another important property of irrelevance is that if P and Q are irrelevant, then so are their denials:

T20

PirrQ iff (~P)irrQ iff Pirr(~Q) iff (~P)irr(~Q).

 

This too can be proved by noting some series of equivalences that yield irrelevance. First consider pr(P&~Q), assuming PirrQ. Then pr(P&~Q) = pr(P)-pr(P&Q) = pr(P)-pr(P)pr(Q) = pr(P)(1-pr(Q))= pr(P)pr(~Q). So Pirr(~Q) if PirrQ. The converse can be proved by running the argument in reverse order, and so Pirr Q iff Pirr(~Q). The other equivalences are proved similarly.

Finally, the concept of irrelevance, which so far has been used in an unconditional form, may be given a conditional form, when we want to say that P and Q are irrelevant if T is true:

Def 3

PirrQ|T iff pr(Q|T&P) = pr(Q|T)

 

This says that the probability that Q is true if T is true is just the same as when T and P are both true - i.e. P's truth makes no difference to Q's probability, if T is true. It should be noted that Def 3 requires that pr(T&P) > 0 (which makes pr(T) > 0), but that on this condition T19 shows that Def 3 is just a simple extension of Def 2. And as with Def 2 there is symmetry:

T21

PirrQ|T iff QirrP|T

 

For suppose PirrQ|T. By Def 3 pr(Q|T&P)=pr(Q|T) iff pr(Q&T&P):pr(T&P)=pr(Q&T):pr(T) by Def 1. This is so iff pr(Q&T&P):pr(Q&T) = pr(T&P):pr(T) iff pr(P|Q&T)=pr(P|T) iff QirrP|T by Def 3.

And this conditional irrelevance of Q from P if T does not only hold in case P is true, but also in case P is false. That is:

T22

PirrQ|T iff (~P)irrQ|T

 

For suppose PirrQ|T, i.e. pr(Q|T&P) = pr(Q|T). By def 1 this is equivalent to pr(Q&T&P):pr(T&P) = pr(Q&T):pr(T) iff pr(Q&T&P) = pr(T&P)pr(Q&T):pr(T). Now pr(Q&T&P) = pr(Q&T)-pr(Q&T&~P), and so we obtain the equivalent pr(Q&T&~P) = pr(Q&T)-(pr(T&P)pr(Q&T):pr(T)) = pr(Q&T)(1-(pr(T&P):pr(T)) = pr(Q&T)((pr(T)-pr(T&P)) : pr(T)) = pr(Q&T)(pr(T&~P):pr(T)) from which we finally obtain as equivalent to PirrQ|T pr(Q&T&~P):pr(T&~P) = pr(Q&T) : pr(T), which is by Def 3 the same as (~P)irrQ|T. Qed.

And finally T21 and T22 yield the same result for conditional irrelevance as for irrelevance:

T23

 PirrQ|T iff QirrP|T

 

 (1)

           iff (~P)irrQ|T

T21

 (2)

           iff Pirr(~Q)|T

T21, T22, (1)

 (3)

           iff (~P)irr(~Q)|T

 (2)

The proof is: The first line is T21, the second T22. The third results thus: By both theorems, QirrP|T iff (~P)irrQ|T whence PirrQ|T iff (~P)irrQ|T by T21. The fourth results from this by substituting (~Q) for Q. Qed.

 

 

   

4. How probability theory explains learning from experience:

So far as regards the mathematics of probability for the moment. Now let's apply what we have established.

There are a number of important and intuitive principles of confirmation which are always used and which can be proved with the help of probability theory. These principles were stated to be of four kinds and may be summarised as:

Confirmation:

The probability of a theory increases as its consequences are verified.

Support:

The probability of a theory increases as relevant circumstances are verified.

Competition:

The probability of a theory increases as its competing theories are falsified.

Undermining:

The probability of a theory decreases as its assumptions are falsified


That we can prove these principles, and in considerable more detail than they are here stated shall be shown now, using the above results.


 

Confirmation

In the present section I shall use T for theories and P for their predictions, and I shall require that a prediction is logically entailed by the theory it is said to be a prediction of. As a first result we have, by T3

T24

No theory has a probability higher than its most improbable consequences.

 

The formula is: If T |-P, then pr(T) <= pr(P)

This gives a means to gauge the probability of a theory by the probability of its consequences, and it shows that, however plausible a theory may sound, it is not probable if it entails an improbable proposition.

For the next result we use T8, which has it that the simultaneous truth of a theory and its predictions satisfy the identity pr(T)pr(P|T) = pr(P)pr(T|P). T8 is the central theorem in this chapter, as the reader shall see. It follows that if T |- P, T13 entails that pr(T)=pr(P)pr(T|P), which can be stated in words as:

T25

The probability of a theory is proportional to the probability of its predictions.

 

The formula is: If T |- P, then pr(T)=pr(P)pr(T|P).

For as long as pr(T|P) remains constant (because P has not been verified or falsified), pr(T) varies with pr(P): If the latter grows, so does the former; and if pr(P) decreases, so does pr(T).

Now T25 is a sharper version of T24, and neither says anything about confirmation by itself. So suppose P is verified. It follows from the earlier pr(T) = pr(P)pr(T|P) that pr(T|P) =pr(T):pr(P), and since pr(P) < 1, it follows pr(T|P) > pr(T). In words:

T26

The probability of a theory increases as its consequences are verified, and does so proportionally to the improbability of its verified consequences.

 

The formula is: If T |- P, pr(T|P)= pr(T):pr(P).

This shows how we can come to attach great confidence to a theory which we initially gave a low probability: If several times in succession an improbable prediction of a theory is verified, its probability rapidly increases. Or else the theory may entail many rather probable consequences, all of which are verified. In either case our confidence increases substantially: In the former case because of verifying a few improbable predictions; and in the latter case because of verifying many predictions each of which slightly increased the last result. Incidentally, T24 shows that the probability of a theory cannot increase beyond 1, and T25 and T26 together show that it is wise to concentrate on the least probable predictions of a theory.

Undermining
 

Suppose we have a theory T that we derived from a more general theory G, i.e. G |- T. This also covers the case when T is a mere prediction. Now suppose that it happens we can falsify G, by the usual means of deriving a false consequence. What can we say about the probability of T, to which G served as a possible ground?

Well, by T12 it follows from G |-T that pr(G&~T)=0, and so by T5 we have pr(T) = pr(T&G)+pr(T&~G) i.e. pr(T&~G) = pr(T)-pr(T&G), which by the same T5 applied to pr(G) = pr(G&T)+pr(G&~T) = pr(G&T) (as pr(G&~T)=0) yields pr(T&~G) = pr(T)-pr(G). Dividing both sides by pr(~G)=1-pr(G) yields pr(T|~G) =((pr(T)-pr(G)):(1-pr(G)). In words:

T27

The probability of a proposition (theory or prediction) T is decreased if a possible ground G for it is falsified.

 

The formula is: If G |- T, then pr(T|~G) = (pr(T)-pr(G)):(1-pr(G)).

Of course, it follows from G |- T that pr(G) <= pr(T), so that the numerator on the RS cannot become negative. It also follows that the larger pr(G) was, the smaller pr(T|~G) will be. This is not immediately obvious, perhaps. It can be seen as follows: As pr(T) < 1, put pr(T) = 1-x, and put pr(G)=g. Then the RS of (18) translates as: ((1-x)-g):(1-g). This is the same as ((1-g)-x): (1-g), which in turn is the same as 1-(x:(1-g)). Now it can be seen that as as pr(G) is larger, 1-g = 1-pr(G) is smaller, and therefore x:(1-g) larger, and in result pr(T|~G) smaller, qed. Incidentally, this also shows that if pr(T)=1, so that x=0 (18) does not apply - but then also ALL possible grounds for T must have had a probability of 1, and then pr(T|~G) does not exist.


Competition

A similar argument as the one I have just given applies in the case of two competing theories T1 and T2. If these are real competitors they cannot be both true, whence pr(T1&T2)=0. Therefore pr(T1)=pr(T1&~T2) by T5, which works out by way of T8 and T1 as pr(T1) = pr(~T2)pr(T1|~T2) = (1-pr(T2))pr(T1|~T2) i.e. pr(T1|~T2) = pr(T1):(1-pr(T2)). Clearly, the greater pr(T2) is, the smaller the denominator on the right, and so the larger pr(T1|~T2) and conversely, which is to say:

T28

The probability of a theory is increased if a competing theory is falsified, the more so the larger the probability of the competitor.

 

The formula is: If ~(T1&T2), then pr(T1|~T2) = pr(T1):(1-pr(T2))

So if we falsify a theory with a high probability, the probability of its competitors are greatly increased, and if we falsify a theory with a small probability, the probabilities of its competitors are but little increased.


Support

 

Finally, suppose that if T is true it does not follows that Q is true, but it does follow - deductively, and consequently with some probabilistic premise in the assumptions of T - that Q is more probable if T is true. Usually this comes about through deriving a prediction P from T, and knowing that, in actual fact, P and Q, depend to some extent on another. We shall say that, then, P and Q are relevant to each other, or also that they are (partial) conditions for each other.

This is to say the same as: T makes a positive difference to Q, which is to say that Q is more probable if T is true than if T is not true, or in symbols pr(Q|T) > pr(Q|~T). In chapter 2 we saw that this amounted to: T is positively relevant to Q. As above, we shall say that Q in such a case is a condition for T. It follows that pr(T|Q) > pr(T) i.e.

T29

The probability of a theory is increased if a positively relevant condition is verified and the probability of a theory is decreased if a negatively relevant condition is verified.

 

The formulas are: If QprelT, then pr(T|Q)>pr(T), and if QnrelT, then pr(T|Q) < pr(T). If QrelT, pr(T|Q) depends normally on pr(Q|~T): The greater this is, the more pr(T|Q) is altered.

In court, this is the case of circumstantial evidence: If a hypothesis T, when true, would make a certain circumstance (or condition, or - possible - fact) Q more probable, the verification of Q makes T more probable. It is just the same with negative circumstantial evidence: If verified it makes the hypothesis less probable. Of course, confirmation is a special case of support: When the support the hypothesis gives to the condition is maximal, because the hypothesis entails the condition as a consequence.

Now let's prove the last statement of T29, that pr(T|Q) depends normally on pr(Q|~T). What we have using only T1, Def 1 and T5 is

T30

 pr(T|Q) = pr(T&Q):pr(Q) = (pr(T)pr(Q|T)):(pr(T)pr(Q|T)+(1-pr(T))pr(Q|~T))

Now if we suppose pr(T) and pr(Q|T) fixed, since together they amount to pr(T&Q), and as this is anyway natural until we learn more about them, it follows that pr(T|Q) depends solely pr(Q|~T), i.e. the probability of the circumstances on the denial of the hypothesis they partially depend on. This makes sense: If circumstance Q contributes to the probability of a theory T, the extent of its contribution depends on the probability of Q when T is not true.

This is in its own right an important result because it shows we must always consider the denials of our hypotheses, since they are relevant. Indeed, we can see that the smaller pr(Q|~T) gets, the smaller becomes the second factor in the denominator, and so the closer does the whole fraction, and thus pr(T|Q) gets to 1.

It is interesting to note what is the case if TirrQ: Then pr(Q|T) = pr(Q|~T), and so it may be factored out in the denominator of the above expansion of pr(T|Q). Factored out it drops away against the same term in the numerator, and we get pr(T|Q) = pr(T):(pr(T)+(1-pr(T)) = pr(T). And this is just a new statement of the irrelevance we started with.

 


5. The utility of probability theory

The foregoing sections show:

- how we can increase the probabilities of our theories by verifying their consequences;

- how we can do the same by undermining their competitors;

- how our theories gain in probability if their competitors are refuted;

- how our theories are supported by verifying positively relevant conditions and infirmed by verifying negatively relevant conditions

This holds for both Kolmogorov's axioms and CPT. They also both share the same weakness: It has not been laid down how to apply formulas in which "pr(A)" occurs to actual facts, and in particular it has not been laid down what "a" must be in "pr(A)=a" except that 0<=a<=1 and that if A is a provable falsehood in a theory its probability given that theory is 0 and if A is a provable theorem in a theory its probability given that theory is 1.

This is an important theoretical lack, but in practice it is often met by rules that do succeed in uniquely assigning probabilities to statements that do satisfy the axioms of probability.

 

Maarten Maartensz

Colophon: This is a small revision of the earlier file first uploaded May 1, 2009. It can be taken as replacing the earlier one, and therefore has the same name.


 

(*) They might also be taken as axiomatic for (rational) degrees of belief.

Note the first two axioms are about truths and inferences in standard logic and are both implications only.