Introduction
In this page, I shall present a version of the standard classical probability theory that I shall call CPT. Just as in case of CPL in chapter 2 it is based on standard elementary algebra and English, and has all the standard theorems of standard probability theory  on a slightly different new basis.
Here is an outline and preview with links:
Sections
1. A new set of axioms for probability (*)
2. Kolmogorov's standard axioms for probability are derived
3. Over 20 standard theorems of probability theory:
A. Basic unconditional theorems
B. Basic conditional theorems
C. Basic theorems about irrelevance
4. How probability theory explains learning from experience
Confirmation
Undermining
Competition
Support
5. The utility of probability theory
You can take the first for granted and go straight to the second or third, while the fourth is the most interesting and involves some fundamental applications of the basic ideas of probability theory to reasoning and to learning from experience. Something much like this and much more is in G. Polya's two volumes on Plausible Reasoning.
In particular, in section 4 it will be shown that here are a number of important and intuitive principles of confirmation which are always used by people reasoning about matters of facy and which can be proved with the help of probability theory.
These principles are stated to be of four kinds and may be summarised as:
Confirmation:

The probability of a theory increases as its consequences are verified.

Support:

The probability of a theory increases as relevant circumstances are verified.

Competition:

The probability of a theory increases as its competing theories are falsified.

Undermining:

The probability of a theory decreases as its assumptions are falsified

These principles are then proved on the basis of what was established in the earlier sections.
All reasoning and mathematics in what follows is elementary but some knowledge of propositional logic is presupposed if not strictly necessary, since most formulas are (initially) given English readings.
1. New axioms for probability theory
What I shall provide is a set of three new axioms that imply the standard axioms for probability of Kolmogorov. These new axioms make it easier to join probability theory to propositional logic than Kolmogorov's axioms and are more elementary and simpler than his in several respects  as shall be shown. (*)
Here are the axioms, where all that is assumed about "pr(A)" is that it is equal to some number and read as "the probability of A". This means that one must add synactical rules to the effect
Notation:

"pr(A)" is "the probability of A"

"_{T}A" is "A is a theorem of theory T"

CPTSyntaxis

As for CPL plus:

 CPTpr() : If "A" is a proposition of CPL and "a" any number between 0 and 1 inclusive,
"pr(A)=a" is proposition of CPT.

 CPT_{T} : If "A" is a proposition of CPL and "T" a name for a set of statements of CPL,
"_{T} A" is a proposition of CPT.

This means that CPT is syntactically an extension of CPL: "pr(A)" refines [A] in that (as we shall prove) 0 <= pr(A) <= 1.
The notation "_{T}" is introduced to facilitate the link to Kolmogorov's statement and to have a convenient abbreviation for "A is a theorem of theory T". Introducing it is not necessary, for "[A]=1 holds in theory T" or "pr(A)=1 holds in theory T" are taken to mean the same. Also it is noteworthy that it does not follow that one can iterate either "_{T}" as in "_{T}(_{T}A)" or iterate "pr()" as in "pr(pr(A)=a)=b".
Now the semantical axioms for CPT are:

If A and B are any propositions in CPL:

Alternatively expressed:

AxA.

( A) > pr(A)=1

[A]=1 > pr(A)=1

AxB.

(( A > B) > pr(A)<=pr(B)

[A>B]=1 > pr(A) <= pr(B)

AxC.

pr(A)=pr(A&B)+pr(A&~B)

pr(A)=pr(A&B)+pr(A&~B)

Here " A" formalizes the notion that "A is a theorem in the presumed theory", where "a theory" is "a set of assumptions added to the axioms of logic" and the "theorems of the theory" are all statements that can be deduced from the theory by inference rules of a presumed logic, such as CPL.
Note that in what follows the reference to a theory T is abstracted from (though in any application this will be what one will want to find logical consequences from), and that therefore while [A]=1 iff  A is useful, the notation " A_{T}" makes reference to an item "[A]=1" doesn't (though it could be easily added).
Also, it is noteworthy that mere factual truth of A is not sufficient to make the hypotheses of AxA and AxB true: Indeed, what one normally wants is an assurance (and so a proof) that a given theory T does logically imply or fail to imply a certain proposition P  after which one has an external check on theory T, by finding whether the proposition P is in fact true or false.
I abstract from reference to theories to simplify and eliminate clutter, but it is useful to state a version with such references and provide readings, i.a. because this shows how neatly the axioms tie PT to PL in the present formulations

If A and B are any propositions in CPL:

Alternatively expressed:

AxA.

(_{T} A) > pr(AT)=1

[A_{T}]=1 > pr(AT)=1

AxB.

((_{T} A > B) > pr(AT)<=pr(BT)

[A_{T}>B_{T}]=1 > pr(AT) <= pr(BT)

AxC.

pr(AT)=pr(A&BT)+pr(A&~BT)

pr(AT)=pr(A&BT)+pr(A&~BT)

Here is a reading with the various optional reference to a supposed theory T (a sequence of statements of CPL) left ou

CPTaxioms in words:

AxA:

A is a theorem only if the probability of A is 1.

AxB:

(A only if B) is a theorem only if the probability of A is less than or equal to the probability of B.

AxC:

The probability of A is the sum of the probabilities of A and B and A and not B.

It is from the formal statement of these axioms  dropping references to T  we shall now derive Kolmogorov's axioms  which also do not explicitly refer to a theory that may be used in the hypotheses of its axioms.
2. The proof of the standard Kolmogorov axioms for probability theory:
These standard Kolmogorov axioms for probability are normally stated in such terms as:

Kolmogorov axioms for probability theory:


Suppose that $ is a set of propositions P, Q, R etc. and that this set is closed for negation, conjunction and disjunction, which is to say that whenever (P e $) and (Q e $), so are ~P, (P&Q) and (PVQ). Now we introduce pr(.) as a function that maps the propositions in $ into the real numbers in the following way, that is, satisfying the following three axioms:



A1.

For all P e $ the probability of P, written as pr(P), is some nonnegative real number.

A2.

If P is logically valid, pr(P)=1.

A3.

If ~(P&Q) is logically valid, pr(PVQ)=pr(P)+pr(Q).

In fact, we don't need the initial statement, since we simpy presume CPL, which does meet the specifications of the initial statement. What we do need is proofs of A1, A2 and A3. Here they come.
First, there is the fundamental theorem that permits inferences from logical equivalences to probabilities:
T*1:

 (A iff B) > pr(A)=pr(B)

Equivalent propositions have the same probability

(1)

 (A iff B) >  (A > B) > pr(A) <= pr(B)

AxB

(2)

 (A iff B) >  (B > A) > pr(B) <= pr(A)

AxB

(3)

 (A iff B) > pr(A) <= pr(B) & pr(B) <= pr(A)

(1), (2)

(4)

 (A iff B) > pr(A) = pr(B)

(3), Algebra

Next, it is proved contradictions have probability 0:
T*2

pr(A&~A)=0

Contradictory propositions have zero probability

(1)

pr(A)=pr(A&A)+pr(A&~A)

AxC

(2)

pr(A)=pr(A&A)

T*1 with ( A iff (A&A))

(3)

=pr(A)+pr(A&~A)

(1), (2)

(4)

pr(A&~A)=0

(3), Algebra

It is often helpful to have in propositional logic two special constants, such as Taut (from "tauology") and Contrad (from "contradiction"). These are defined as: Taut iff AV~A and Contrad iff A&~A. Taking this for granted:
T*3

0 <= pr(A) <= 1

Probabilities are between 0 and 1 inclusive

(1)

 A > pr(A)=1

AxA

(2)

 pr(Taut)=1

(1) and  Taut

(3)

 A > Taut

Logic

(4)

pr(A) <= pr(Taut)

(3), AxB

(5)

pr(A) <= 1

(2), (4)

(6)

pr(Contrad)=0

T2

(7)

 (Contrad > A)

Logic

(8)

pr(Contrad) <= pr(A)

(7), AxB

(9)

0 <= pr(A)

(6), (8)

(10)

0 <= pr(A) <= 1

(5), (9)

Next, we need to prove the probabilistic theorem for denial. We do it in two steps:
T*4

pr(AV~A)=pr(A)+pr(~A)

Probability of disjunction of exclusives is sum of probability of factors

(1)

pr(AV~A)=pr((AV~A)&A)+pr((AV~A)&~A)

AxC

(2)

pr(A)=pr((AV~A)&A)

T*1, as  ((AV~A)&A) iff AT*1, as  ((AV~A)&~A) iff ~A

(3)

pr(~A) = pr((AV~A)&~A)


(4)

pr(AV~A) = pr(A)+pr(~A

(1),(2),(3)

T*5

pr(~A)=1pr(A)

Probability of denial is complementary probability

(1)

pr(AV~A)=pr(A)+pr(~A)

T*4

(2)

1=pr(A)+pr(~A)

AxA since  AV~A

(3)

pr(~A) = 1pr(A)

(2), Algebra

Next, we have this parallel to AxA:
T*6

~A > pr(A)=0

Provable nontruths have zero probability

(1)

~A

Assumption

(2)

pr(~A)=1

(1), AxA

(3)

1pr(A)=1

(2), T*5

(4)

pr(A)=0

(3), Algebra

The main point of T*6 and AxA is that if one can prove that A (or ~A), then thereby it follows that pr(A)=1 (or pr(A)=0 if ~A). This is normally important in comparing the supposed truths and nontruths one can logically infer from a theory, with what the facts are (so that if one can prove that _{T} A, while in fact one finds ~A one thereby has learned the assumptions of theory T can't be all true, if the proof of _{T} A was without mistakes in reasoning. Incidentally, this shows one should not define "_{T} A" as "Nec A", with "Nec" the modality of necessary truth: This amounts to the presumption that T is true.)
Next, we need a theorem that serves as a lemma to the next theorem, but that needs a remark itself. The theorem is:
T*7

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

Full disjunctive probabilistic sum of two factors

(1)

pr(A)+pr(~A)=1

T*5

(2)

pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B)=1

(1), AxC

The promised remark is that T*7 differs essentially from the similar theorem in CPL minus the probabilities: In CPL ([A&B]+[A&~B]+[~A&B]+[~A&~B]) is true and implies that precisely one of the four factors is true. In PT pr(A&B)+pr(A&~B)+pr(~A&B)+pr(~A&~B) is true, but normally none of the four alternatives itself is provably true by itself; normally none of the four alternatives are true; and normally several or all of the alternatives will have a probability between 0 and 1 (conforming T*3).
Indeed, a very interesting aspect of PT is that it assigns numerical measures to all alternatives the underlying logic can distinguish, regardless of whether these alternatives are true or have ever been true. And part of the interest is that there normally are far more logically possible alternatives than logically provable alternatives.
To finish the proof CPT indeed implies all of Kolmogorov's axioms for PT we need to derive his Ax3:
T*8

~(A&B) > pr(AVB)=pr(A)+pr(B)

Conditional sums

(1)

~(A&B)

AI

(2)

pr(A&B)=0

T*6

(3)

pr(A)=pr(A&~B)

(2), AxC

(4)

pr(B)=pr(~A&B)

(2), AxC, T*1

(5)

pr(AVB)=1pr(~A&~B)

T5, T*1 with (~(~A&~B)) iff (AVB)

(6)

=pr(A&B)+pr(A&~B)+pr(~A&B)

T*7

(7)

=pr(A&~B)+pr(~A&B)

(2),(6)

(8)

=pr(A)+pr(B)

(3),(4),(7)

I have now proved all of Kolmogorov's axioms for the finite case: A1 follows from T*3; A2 is AxA; and A3 is T*8.
3. Some fundamental theorems of CPT
Irrespective of the axiomatization or interpretation of probability, there are a number of important theorems which we shall need  just as we need laws like (a+b)=(b+a) for counting, irrespective of axioms used to prove them or of what we choose to count. The advantage and use of axioms is that one can use them to prove the theorems one needs  and having given a valid proof one knows that any objection against the theorem must be directed against the axioms, for the theorem was proved to follow from them. So what we shall do first is to derive some useful theorems.
A. Basic unconditional theorems
First, then, there is a group of theorems that the reader may derive from Kolmogorov's axioms (from which they do follow) and that I derived above from my axioms:
T1

pr(~P)=1pr(P)

T*5

T2

0 <= pr(P) <= 1

T*3

T3

If P  Q, then pr(P) <= pr (Q)

AxB

T4

If P is logically equivalent to Q, then pr(P)=pr(Q)

T*1

T5

pr(P)=pr(P&Q) + pr(P&~Q)

AxC

T6

pr(PVQ)=pr(P)+pr(Q)pr(P&Q)

T*7, T5

These were all proved in section (4.1). We only add
T7

pr(P&Q) <= pr(P) <= pr(PVQ)


that is: The probability of a conjunction is not larger than the probability of any of its conjuncts, and the probability of a disjunction is not smaller than the probability of any of its disjuncts. It follows from T5 and T6 or from AxB and logic.
In what follows I'll state and prove the most important theorems of elementary finite probability theory, firstly because I have never seen this done properly in one paper, secondly because it seems to me one of the cornerstones of human reasoning, and thirdly to be able to show how we can learn from experience using probability theory. (The last subject starts in section 4.6. It deserves to be better known than it is, for it could help to defuse, refute or ridicule much improbable nonsense that people believe in.)
In what follows proofs when referring to axioms refer to Kolmogorov's. Readers thoroughly familiar with elementary probability theory may choose to skip the rest of this chapter, but are advised to read the last sections, 4.11 and 4.12.
B. Basic conditional theorems
Most probabilities are not, as they were in this chapter so far, absolute, but are conditional: Rather than saying "the probability of Q = x" we usually introduce a condition and say, "the probability of Q, if P is true, = y". This idea, that of the probability of a proposition Q given that one or more propositions P1, P2 etc. are true is formalised by the following important definition:
Definition 1

pr(QP) = pr(P&Q):pr(P)


That is: The conditional probability of Q, given or assumed that P is true, equals the probability that (P&Q) is true, divided by the probability that (P) is true. NB, as this fact has important implications for the interpretation and application of probability theory: A conditional probability is defined in terms of absolute probabilities, so therefore we need absolute probabilities to establish conditional ones.
Definition 1 has many applications, and many of these turn on the fact that it also provides an implicit definition of pr(P&Q), namely as pr(P)pr(QP) (simply by multiplying both sides of Def 1 by pr(P)). Consequently, we have as a theorem (if pr(P)>0 and pr(Q)>0)
T8

pr(P&Q)=pr(P)pr(QP)=pr(Q)pr(PQ)


The second equality is, of course, also an application of Def 1, and T8 accordingly says that the probability of a conjunction equals the probability of one conjunct time the probability of the other given that the one is true. Another consequence of Def 1 i
which results from T5 and Def 1 upon division by pr(P), and says that the probability of Q if P plus the probability of ~Q if P equals 1. Of course, this admits of a statement like T1:
which shows that conditional probabilities are like unconditional ones. A theorem to the same effect, that parallels T3 is
That 0 <= pr(QP) follows from D1, because the components of a conditional are both >=0 by A1; and that pr(QP)<=1 is equivalent to pr(P&Q) <= pr(P), which holds by T7. A theorem in the vein of T4 is
T12

If P  Q, then pr(P&~Q)=0


This is proved by noting that if P  Q holds, then so does ~(P&~Q), which, by A3, entails that pr(PV~Q)=pr(P)+pr(~Q). As by T6 pr(PV~Q)=pr(P)+pr(~Q)pr(P&~Q), it follows pr(P&~Q)=0 if P  Q. From this it easily follows that
T13

If P  Q, then pr(QP)=1 provided pr(P)>0


which is to say that if Q is a logical consequence of P, the probability of Q is P is true is 1. The proviso is interesting, for it denies the possibility of inferring Q from a logical contradiction or known falsehood. This means that the def: P  Q =df pr(QP)=1 strengthens the logical "" by adding that proviso. T13 immediately follows from T5, T12 and Def 1.
Def 1 may, of course, list any finite number of premises, as in pr(QP1&....&Pn) = pr(Q&P1&....&Pn): pr(P1&....&Pn). Such long conjunctions admit of a theorem like T8:
T14

pr(P1&.....&Pn)=pr(P1)pr(P2P1)pr(P3P1&P2).....pr(PnP1&.....&Pn)


This says that the probability that n propositions are true equals the probability that the first (in any convenient order) is true times the probability that the second is true if the first is true times the probability that the third is true if the first and the second are true etc. The pattern of proof can be seen by noting that for n=3 pr(P1)pr(P2P1)pr(P3P1&P2) = pr(P1&P2)pr(P3P1&P2) = pr(P3&P2&P1) because the denominators successively drop out by Def 1. That the premises can be taken in any order is a consequence from T4: Conjuncts taken in any order are equivalent to the same conjuncts in any other order.
T11 and T13, together with T9 and T10, show that conditional probabilities are probabilities we need just one further theorem:
T15

If R  ~(P&Q), then pr(PVQR) = pr(PR)+pr(QR)


which parallels A3. It is easily proved by noting that pr(PVQR) = (pr(P&R)+pr(Q&R)pr(P&Q&R)):pr(R) by Def 1, T4 and T6, and that pr(P&Q&R)=0 by T12 and T4 on the hypothesis. The conclusion then follows by Def 1.
C. Basic theorems about irrelevance
A second important concept which now can be defined is that of irrelevance. Two propositions P and Q are said to be  probabilistically  irrelevant, abbreviated PirrQ if the following is true:
Def 2

PirrQ iff pr(P&Q)=pr(P)pr(Q)


Evidently, irrelevance is symmetric:
But there are more interesting results. Let's call a logically valid statement a tautology and a logically false statement a contradiction. Then we can say:
T17

Any proposition is irrelevant to any tautology and to any contradiction.


Note that this entails that tautologies are also mutually irrelevant. To prove T17, first suppose that P is tautology. By A2 pr(P)=1. Since tautologies are logically entailed by any proposition, Q  P, and so pr(Q&~P)=0 by T12. Consequently, it follows pr(Q)=pr(Q&P) by T5, and so pr(P).pr(Q)=1.pr(Q&P)= pr(P&Q) and we have irrelevance. Next, suppose (P) is a contradiction. If so, ~(P) is a tautology, and so pr(P)=0 by T1. By T7 pr(P&Q) <= pr(P) and as by A1 all probabilities are >= 0, it follows pr(P&Q)=0. But then pr(P)pr(Q)=0.pr(Q)= 0=pr(P&Q), and again we have irrelevance.
Def 2 is often stated in two other forms, which are both slightly less general, as they require respectively that pr(P)>0 or that pr(P)>0 and pr(~P)>0, in both cases to prevent division by 0. Both alternative definitions depend on Def 1, and the first is given by
T18

If pr(P)>0, then PirrQ iff pr(QP)=pr(Q)


This is an immediate consequence of Defs 1 and 2. It states clearly the important property that irrelevance3 signifies: If P is irrelevant of Q, the fact that P is true does not alter anything about the probability that Q is true  and conversely, by T16, supposing that Q is not also a contradiction. So irrelevance of one proposition to another is always mutual, and means that the truth of the one makes no difference to the probability of the truth of the other.
This can again be stated in yet another form, with once again a slightly strengthened premise, for now it is required that both pr(P) and pr(~P) are > 0:
T19

If 0 < pr(P) < 1, then PirrQ iff pr(QP)=pr(Q~P)


Suppose the hypothesis, which may be taken as meaning that P is an empirical proposition, is true.T19 may be now proved by noting the following: pr(QP)=pr(Q~P) iff pr(Q&P):pr(P) = pr(Q&~P): (1pr(P)) iff pr(Q&P)  pr(P)pr(Q&P) = pr(P)pr(Q&~P) iff pr(Q&P) = pr(P)(pr(Q&P)+pr(Q&~P)) iff pr(Q&P) = pr(P)pr(Q).
Another important property of irrelevance is that if P and Q are irrelevant, then so are their denials:
T20

PirrQ iff (~P)irrQ iff Pirr(~Q) iff (~P)irr(~Q).


This too can be proved by noting some series of equivalences that yield irrelevance. First consider pr(P&~Q), assuming PirrQ. Then pr(P&~Q) = pr(P)pr(P&Q) = pr(P)pr(P)pr(Q) = pr(P)(1pr(Q))= pr(P)pr(~Q). So Pirr(~Q) if PirrQ. The converse can be proved by running the argument in reverse order, and so Pirr Q iff Pirr(~Q). The other equivalences are proved similarly.
Finally, the concept of irrelevance, which so far has been used in an unconditional form, may be given a conditional form, when we want to say that P and Q are irrelevant if T is true:
Def 3

PirrQT iff pr(QT&P) = pr(QT)


This says that the probability that Q is true if T is true is just the same as when T and P are both true  i.e. P's truth makes no difference to Q's probability, if T is true. It should be noted that Def 3 requires that pr(T&P) > 0 (which makes pr(T) > 0), but that on this condition T19 shows that Def 3 is just a simple extension of Def 2. And as with Def 2 there is symmetry:
For suppose PirrQT. By Def 3 pr(QT&P)=pr(QT) iff pr(Q&T&P):pr(T&P)=pr(Q&T):pr(T) by Def 1. This is so iff pr(Q&T&P):pr(Q&T) = pr(T&P):pr(T) iff pr(PQ&T)=pr(PT) iff QirrPT by Def 3.
And this conditional irrelevance of Q from P if T does not only hold in case P is true, but also in case P is false. That is:
T22

PirrQT iff (~P)irrQT


For suppose PirrQT, i.e. pr(QT&P) = pr(QT). By def 1 this is equivalent to pr(Q&T&P):pr(T&P) = pr(Q&T):pr(T) iff pr(Q&T&P) = pr(T&P)pr(Q&T):pr(T). Now pr(Q&T&P) = pr(Q&T)pr(Q&T&~P), and so we obtain the equivalent pr(Q&T&~P) = pr(Q&T)(pr(T&P)pr(Q&T):pr(T)) = pr(Q&T)(1(pr(T&P):pr(T)) = pr(Q&T)((pr(T)pr(T&P)) : pr(T)) = pr(Q&T)(pr(T&~P):pr(T)) from which we finally obtain as equivalent to PirrQT pr(Q&T&~P):pr(T&~P) = pr(Q&T) : pr(T), which is by Def 3 the same as (~P)irrQT. Qed.
And finally T21 and T22 yield the same result for conditional irrelevance as for irrelevance:
T23

PirrQT iff QirrPT


(1)

iff (~P)irrQT

T21

(2)

iff Pirr(~Q)T

T21, T22, (1)

(3)

iff (~P)irr(~Q)T

(2)

The proof is: The first line is T21, the second T22. The third results thus: By both theorems, QirrPT iff (~P)irrQT whence PirrQT iff (~P)irrQT by T21. The fourth results from this by substituting (~Q) for Q. Qed.
