Maarten Maartensz:    Philosophical Dictionary | Filosofisch Woordenboek                      

 B - Bayesian Conditionalization


Bayesian Conditionalization: The application of the following theorem of elementary probability theory: p(T|P)=p(P|T)*p(T):p(P) to revise p(T) to p(T|P) if one learns that P is true.

There are quite a few conceptual problems in using this theorem in this way, and the present lemma articulates one approach to dissolve these problems. One key-idea is the distinction between degrees of belief and probabilities, that are both brought together on the common footing of proportions. Other assumptions that relate proportions, probabilities and degrees of belief are made Section 1.


1. Probabilities, degrees of belief and proportions
2. Derivation of Bayesian Conditionalization for degrees of belief
3. A simple example
4. The reason for this lemma

5. Alternative and further axioms
6. Putting it all together

1. Probabilities, degrees of belief and proportions

I start with articulating a number of assumptions about probabilities, degrees of belief and proportions. Some knowledge about probabilities and proportions is presupposed in what follows, and can be gotten by way of the links. Likewise, I presuppose some elementary knowledge of standard logic.

Ax.1: Probabilities are proportions.

Proportions are taken as ratios of cardinal numbers of subsets in sets, and include conditional proportions as in probability theory.

Ax.2: Degrees of belief are proportions.

Therefore probabilities and degrees of belief share the formalities and properties of proportions. Since degrees of belief may alter with time they involve a reference to time, and Ax.2 accordingly may be formalized thus:

(Ax.2) (t)(a)(X) ( ps(a,X).t e PROPORTION )

Here t is a temporal index, a names a person, and X a belief of the person, and so ps(a,X).t is the degree of belief of a in X at time t.

Ax.3: Degrees of belief follow beliefs in probabilities.

In other words: A person's degree of belief in X - if the person is rational - conforms to his belief about the probability of X. The reason for this assumption is that one's beliefs about the probabilities are what one believes and have a degree. Writing 'aB(pr(X)=y).t' for 'a believes at t that the probability of X is y' Ax.3 may be formalized thus:

(Ax.3) (t)(a)(X)(y) ( ps(a,X).t = y IFF aB( pr(X)=y).t )

Next, we have a similar assumption as A.3 for conditional degrees of belief, but without assuming these are derived from conditional probabilities:

Ax.4: Conditional degrees of belief are beliefs in probabilities from hypotheses.

This is an explanation of what conditional degrees of belief are that can be formalized thus:

(Ax.4) (t)(a)(X)(Y)(z)( ps(a,Y|X).t = z IFF aB(X |- pr(Y)=z).t )

One particular point about aB(X |- pr(Y)=z).t is that what a believes and assumes about X allows a at t to deduce that pr(Y)=z if X is true. This may involve rather a lot of assumptions in X, but that is as may be, and is also the place where these assumptions should be.

Ax.5: Beliefs in probabilities from hypotheses once adopted remain adopted until revised.

This is a natural assumption about beliefs in probabilities from hypotheses: Assumptions do not depend on time but on oneself and such hypotheses as one has, and one retains them until one revises them, and revises them when revising the evidence. Ax.5 can be formalized thus:

(Ax.5) (t)(a)(X)(Y) ( aB(X |- Y).t-1 |- aB(X |- Y).t )

Note what I have not assumed: That degrees of beliefs are the same as probabilities, and that what I have assumed that goes quite a long way in that direction is: One can infer degrees of belief given beliefs in probabilities. For this follows from Ax.3, which can be seen as a version of David Lewis' so called Principal Principle.

Apart from the notation, there are alternatives for some of the above assumptions. I mention two cases.

Thus, one possible weakening of Ax.3 is to assume that degrees of belief depend functionally on beliefs about probability, e.g. as a linear function, that allows for qualifications relating to the quality of the evidence one has. However, it seems generally most sensible to use Ax.3 as stated for one's calculations, and only after one has made them and has revised one's degrees of belief to qualify the result with reference to the quality of the evidence, if this is necessary. A general assumption that may enter here is

Ax.3A: Degrees of belief follow beliefs in probabilities and never exceed them.

The reason is that one's probabilities state such guesses and such evidence as one has, and therefore are in the nature of the best one can do for the moment, given what one believes.

A possible additional assumption in Ax.5 is that also ~(aB~(X |- Y)).t. The revised axiom accordingly may be written as

(Ax.5A) (t)(a)(X)(Y) ( aB(X |- Y).t-1 & ~(aB~(X |- Y)).t. |- aB(X |- Y).t )

This may be taken as saying that if a at t-1 believes that X entails Y and at t a does believe that Y does not follow from X then at t a believes that X entails Y. Thus the hypothesis at t affirms that one has not revised one's estimate at t-1. In conclusion one has that one believes at t what one believed at t-1.

2. Derivation of Bayesian Conditionalization for degrees of belief

Given these assumptions, the following theorem can be proved, where I avoid universal quantifiers and use free variables instead (that allow the inference of universal quantifiers):

T1.  ps(a,E).t=1 & ps(a,E|C).t-1 = x & ps(a,E).t-1 = y & ps(a,C).t-1=z
      |- ps(a,C).t=ps(a,C|E).t-1= x*z:y

This asserts Bayesian Conditionalization for degrees of belief follows from the usual hypotheses involved in Bayesian conditionals. Note there is an explicit reference to time, and the new evidence at t is that ps(a,E).t=1. The conclusion is a new probability for C at t that differs from the old one at t-1 unless x=y.

One reason for this introduction of temporal indexes is that it makes a lot of intuitive sense when speaking of degrees of belief. Another reason is given in the next section.

Here is the proof of T1 with some comments:

(1)   ps(a,E).t=1                                                                by AI
(2)   ps(a,E|C).t-1 = x                                                         by AI
(3)   ps(a,E).t-1 = y                                                            by AI
(4)    ps(a,C).t-1=z                                                                             by AI

This lists all assumptions of the theorem to be proved.

(5)   ps(a,C|E).t-1 = ps(a,E|C).t-1 * ps(a,C).t-1 : ps(a,E).t-1      by PT, A1-4
(6)   ps(a,C|E).t-1 = x*z : y                                                 by 2,3,4,7

Note that (5) follows from A1-4: Its left-hand side equals its right-hand side in probability theory and therefore both sides are the same as degrees of belief, since ratios of the same quantities are the same, and degrees of belief follow probabilities. And then (6) follows from the assumptions.

(7)   ps(a,C|E).t-1 = ps(a,C|E).t                                           by 2-4, A5

This follows from A5 since all assumptions to derive ps(a, C|E).t-1 have been made. Note that this line is a crucial step for the proof.

(8)   ps(a,C|E).t = ps(a,C&E).t : ps(a,E).t                               by A2-3

This follows since one's degrees of beliefs are proportions that follow one's beliefs in probabilities: Both terms on the right hand side are equal to their corresponding probabilities, and therefore their quotient equals the conditional degree of belief on the left hand side.

(9)  ps(a,C|E).t = ps(a,C).t                                                  by 1,8

And this follows as ps(a,E).t=1, whence ps(a,C&E).t = ps(a,C).t , and so we have the desired conclusion

(10)  ps(a,C).t  = x*z : y                                                     by 6,7,9

QED. But I can say and prove more about conditionalizing:

(11)  aB(pr(E)=1).t                                                              by 1, A3
(12)  aB( C |- pr(E)=x ).t-1                                                   by 2, A4

Here degrees of beliefs and beliefs in probabilities are interchanged. Now, taking a few things for granted here that anyway make intuitive sense concerning inferences with propositions of the form 'aB(X).t' and 'aB(pr(X)=y).t'

(13) aB(E).t                                                                         by 11

This follows from aB(pr(E)=1).t . Since there also is ps(a,C|E).t = x*z : y we have

(14) aB(E |- pr(C)=x*z : y ).t                                                 by 9,10, A4

and now one can infer a revisal of one's degree of belief in the probability of C by ordinary Modus Ponens, that derives aB(Y).t from aB(X).t & aB(X |- Y).t :

(15) aB(pr(C)=x*z : y ).t                                                        by 13, 14

And thus Bayesian Conditionalization may be related to and explained in terms of ordinary conditionalization, given the above assumptions. For clearly what we can also prove is the counterpart of T1 in terms of beliefs:

T2.  aB(pr(E)=1).t & aB(C |- pr(E)= x).t  & aB(pr(E)=y).t-1 & aB(pr(C)=z).t-1
      |- aB(pr(C)=x*z:y).t-1

The proof of this can be gleaned from the foregoing proof, and is simply a matter of using the assumptions that have been made that relate degrees of beliefs and probabilities.

It may be objected here that the reasoning with propositional attitudes has not been clarified, but all that is required here are assumptions to the following effect:

Ax.6 aB(pr(X)=1).t |-  aB(X).t
Ax.7 aB(X |- Y).t   |-  (aB(X).t |- aB(Y).t)

These are sufficient for the above inferences (13)-(15) and are intuitively obvious and unobjectionable.

3. A simple example

It may be good to give a schematic example of the sort of reasoning outlined above. Suppose there is a disease C which is not very common, which usually but not always comes with symptoms E, that without the disease are rare. Suppose then that the probabilities are as follows - where I have written everything to make sense also as percentages, since that is intuitively helpful:

Table 1

  C ~C  
E 9 1 10
~E 1 89 90
  10 90 100

We need not concern ourselves here with how the probabilities were precisely established, but what does matter is that C and E and their complements refer to classes of cases, such as numbers of incidence of the disease and the symptoms.

It follows from the above assumptions that one's degrees of belief are numerically  the same as the probabilities one believes. Accordingly, one's degree of belief that someone has C is 1/10. Note that this does not concern a class of cases, but the application of a known probability to a particular case, with a degree of belief in return that is numerically the same as a probability, on the strength of the assumptions we made about the relations between proportions, probabilities and degrees of belief.

Now suppose that one finds that this particular person does show the symptoms E. Then one's degree of belief that the person has the disease C by the above reasoning changes from 1/10 to 9/10.

Note that nothing changes in the probabilities: They remain just as they were, and indeed may be used again for other cases of possible C of other persons. There is in the present approach no revised probability: There only is a revised degree of belief given new evidence. If there are priors and posteriors they are not in probability but in degrees of belief, and indeed in section 2 the prior corresponds to ps(a,C).t-1 = z and the posterior to ps(a,C).t  = x*z : y. 

4. The reason for this lemma

The main reason for the lemma on Bayesian Conditionalization derives from two beliefs I have

  • (A) The principle of Bayesian Conditionalization is the best approach towards a logic of scientific inference: Only something like it explains how we can learn from the evidence and from experience and how we can revise our degrees of belief systematically and rationally in the light of such evidence as we have.
  • (B) The usual accounts of Bayesian Conditionalization are for various reasons mistaken: In particular, what conditional probabilities are is not clearly articulated in standard approaches; the lack of reference in the standard account to what evidence one has at what time is confusing; and indeed the standard account confuses degrees of belief and probabilities systematically.

There is a lot of literature on the topic. Useful texts for (A) are Howson & Urbach and Adams. Useful texts for (B) are the same plus Stegmüller and Lewis. The last concerns 'A Subjectivist's Guide to Objective Chance' in Jeffrey Ed.

What is new in the present proposal are the axiomatic assumptions in Section 1 and the proof in Section 2. Apart from what was said in Section 1 about each assumption, one basic conviction that motivates all assumptions is that degrees of belief are not probabilities, but must conform them to them if one is rational and one's belief in the probabilities is rational. And the reason degrees of belief look like probabilities and behave like them is that both degrees of belief and probabilities are proportions.

But in the present approach, the proportion that the probabilities measure and express concern the real facts of the matter, whereas the proportion that the degrees of belief measure and express concern the beliefs of a person about the application of his beliefs about the probabilities to some specific case.

5. Alternative and further axioms

The axioms in section 1 seem intuitive, but here is an alternative set, that also incorporates the distinction between theoretical and empirical propositions, and an explicit reference to presumed background knowledge K. It will be assumed that ps(a,K).t = 1, but clearly one may have different background knowledge at different times. 

In order to write one of the axioms in a fairly clear way we also need a definition, namely the definition of proper consequence of K&X, which we write as 'K&X |< Y' and define as follows:

aB(K&X |< Y).t =def aB( (K&X |- Y) & ~(K&~X |- Y)).t

Thus, Y is  proper consequence of K&X at t if it follows at t from K&X but not from K&~X.

Now the axioms for degrees of belief are these

Ax1: ps(a, X|K).t =y  & X e EMP IFF aB( K |- p(X)=y).t
Ax2: ps(a, X|K).t =y  & X e THE IFF  aB( (EY)(Ez)( (K&X |< p(Y) = z & y=z).t &
                                                  aB( ~(EZ)(Ez)( (K&X |< p(Z) = z & y=z)).t
Ax3: ps(a, Y|X&K).t =z              IFF aB( K&X |- p(Y)=z ).t

Ax4: ps(a, X&Y|K).t     = ps(a, Y|X&K).t * ps(a, X|K).t
Ax5: ps(a, X|K).t         = ps(a, X&Y|K).t + ps(a, X&~Y|K).t

Ax6: ps(a, Y|X&K).t-1   = ps(a, Y|X&K).t    

Ax1 can be seen as defining when one's degree of belief in X at t given background knowledge K equals y in case X is an
empirical proposition: Precisely if one believes that the probability of X at t is y given K.

Here one may rely on frequencies and sampling for one's beliefs in probabilities, for the proposition X is supposed to be empirical.

Ax2 can be seen as defining when one's degree of belief in X at t given background knowledge K equals y in case X is a theoretical proposition. It is formulated as it is to insist that the Y and Z that are used are proper consequences of K&X.

On this understanding one's degree of belief in theory X at t given background knowledge K equals y precisely if one believes that there is a proposition Y at t that is a proper consequence of K&X that has probability y given K&X and one believes there is not a proposition Z at t that is a proper consequence of K&X that has a probability z such that z is smaller than y.

In brief: One's degree of belief in a theoretical proposition X at t given K equals the probability of the least probable proper consequence of X given K that one believes at t.  

Ax3 can be seen as defining when one's degree of belief in Y at t given K and X is z: Precisely if one believes that the probability of Y at t is z given K and X.

These three axioms accordingly generate degrees of belief from beliefs in probabilities. They all are relative to time, which is best taken as some sort of interval, like 'today' or 'this hour that I am thinking about this problem' (and not an infinitesimally small now): At a later time, one may know more or believe less or differently.

Ax4 can be seen as defining one's degree of belief in X&Y at t given K: This equals the product of one's degree of belief in Y at t given X&K and one's degree of belief in X at t given K.

The degrees of belief on the right side of Ax4 can be obtained by way of Ax1-A3.

Ax5 can be seen as defining one's degree of belief in X at t given K in general, whether X is theoretical or empirical: This equals the sum of one's degree of belief in X&Y at t given K and one's degree of belief in X&~Y at t given K for any Y.

The degrees of belief on the right side of Ax5 can be obtained by way of Ax1-A4.

Ax6 can be seen as imposing a consistency-condition on conditional degrees of belief in Y given K&X in time: These conditional probabilities must be the same at t as at t-1.

Note that by Ax3 what Ax6 says is this

aB( K&X |- p(Y)=z ).t
= aB( K&X |- p(Y)=z ).t-1

and thus one plausible ground for Ax6 is that deductions are valid or not irrespective of time: They do not depend on time but on logic and assumptions.

Now all one needs in the present terms and notations for recalculating one's degrees of belief given one's beliefs in probabilities are these

(*) aB( K |- p(T) ) = t ).t-1.
     aB( K&T |- p(F) = h ).t-1.
     aB( K&~T |- p(F) = g ).t-1.

together with either of aB( K |- p(F ) = 1 ).t
or aB( K |- p(F ) = 0 ).t for the given axioms allow one to calculate respectively

ps(a, T | F&K ).t   = (h*t) : (h*t + g*(1-t)) if one believes F is true
ps(a, T | ~F&K ).t =  ((1-h)*t) : ((1-h)*t + (1-g)*(1-t)) if one believes ~F is true.

And of course both calculations are correct whatever one believes about F, but one can believe - logically speaking - at most one of two contradictory alternatives.

The algebra required for these calculations can be gleaned from the following table, in which ps(a,K).t = 1. The degrees of belief that are presupposed in (*), whence they follow by the axioms given in this section, are coloured red, and all others can be derived from these:

Table 2

K T ~T  
F ps(a, F|T&K).t*ps(a,T&K).t ps(a, F|~T&K).t*ps(a,~T&K).t ps(a,F&K).t
~F ps(a, ~F|T&K).t*ps(a,T&K).t ps(a, ~F|~T&K).t*ps(a,~T&K).t ps(a,~F&K).t
  ps(a,T&K).t ps(a,~T&K).t      1

And to conclude, here are the patterns of inference that the present note provides axioms for.

First, in case p(F) = 1 at t:

     aB( K |- p(T) ) = t ).t-1.
     aB( K&T |- p(F) = h ).t-1.
     aB( K&~T |- p(F) = g ).t-1.
     aB( K |- p(F) = 1 ).t.
     aB( K |- p(T) = (h*t) : (h*t + g*(1-t)) ).t.

And in case p(F) = 0 at t:

     aB( K |- p(T) ) = t ).t-1.
     aB( K&T |- p(F) = h ).t-1.
     aB( K&~T |- p(F) = g ).t-1.
     aB( K |- p(F) = 0 ).t.
     aB( K |- p(T) = ((1-h)*t) : ((1-h)*t + (1-g)*(1-t))).t.

6. Putting it all together again

The previous section involved background knowledge K because this is realistic and generally present, but algebraically it makes no difference and can be left out without any difference in calculated values, and that is the plan we follow in the present section where we put the bits together.

What generally happens when using Bayesian Conditionalization with theories and verified or falsified predictions can be summarized in tabular form as follows.

First, at time t-2 all we have is a real or possible fact F with some probability we believe. In order to put it all in tabular form we start here with this:

Table 3 - at time t-2

F     pr(a,F).t-2
~F     pr(a,~F).t-2

It makes sense to remark that the times we shall refer to are t-2, t-1 and t, and these are best conceived as intervals or periods in which we consider our facts and hypotheses, and try to arrive at some new conclusions.

Second, having the mere possible fact F with a believed probability at t-2 we introduce a new theory T at t-1 and so we have ps(F|~T).t-1 as corresponding to the originals for F: ps(a, F|~T).t-1  = ps(a,F).t-2 . Hence in general ps(a,F).t-1 will differ from ps(a,F).t-2. What we get at t-1 before finding out about F is accordingly this, where our assumptions have been made red:

Table 4 - possibilities at t-1

K T.t-1 ~T.t-1  
F ps(a, F|T).t-1*ps(a,T).t-1 ps(a, F).t-2*ps(a,~T).t-1 ps(a,F).t-1
~F ps(a, ~F|T).t-1*ps(a,T).t-1 ps(a, ~F).t-2*ps(a,~T).t-1 ps(a,~F).t-1
  ps(a,T).t-1 ps(a,~T).t      1

Here it makes sense to introduce some abbreviatory notation:

ps(a,T).t-1 = t
ps(a, F|T).t-1 = h
ps(a, F|~T).t-1  = ps(a,F).t-2 = f

We can calculate all degrees of belief in the last table from these, based on the assumption that indeed they are proportions, like probabilities, and thus the same algebra applies. Putting in these abbreviations, using '~x' for '1-x' we get at t-1 the following, with juxtaposition for multiplication:

Table 5 - fractions at t-1

K T.t-1 ~T.t-1  
F (ht) (f~t) (ht+f~t)
~F (~ht) (~f~t) (~ht+~f~t)
  (t) (~t)      1

Third, at t we find that F, and we recalculate both ps(a, T).t and ps(a, F).t using Bayesian Conditionalization i.e. ps(a,T).t = ps(a, F|T).t-1*ps(a,T).t-1 : ps(a,F).t-1 for T and similarly in the other cases - and note the temporal indexes. The result in terms of our abbreviations looks as follows, if we work all possibilities out by algebra and Bayesian Conditionalization:

Table 6 - fractions at t after Bayesian Conditionalization

K T.t ~T.t  
F h(ht):(ht+f~t) (ht+f~t)(f~t):(ht+f~t) h(ht)+(f~t)(ht+f~t):(ht+f~t)
~F ~h(ht):(ht+f~t) (~ht+~f~t)(f~t):(ht+f~t) ~h(ht)+(f~t)(~ht+~f~t):(ht+f~t)
  (ht):(ht+f~t) (f~t):(ht+f~t)      1

There is a new degree of belief for T at t, namely (ht):(ht+f~t), and also a new degree of belief for ~T at t and for F at t.

Note first, also with respect to the example in section 2, that there is a difference between the case (A) of applying statistics about a disease and symptoms to a patient and (B) of testing a theory with a prediction. For (A) you may have empirical probabilities for all cases, but not for (B). Also, in case of (A) you can plausibly say that you apply probabilities based on classes of cases of patients to a particular patient. The new T and F are then applicable to that patient. But in case of (B)  this mode of proceeding is not so plausible.

It is the second case I am really concerned with, in principle. Here are three points about it, that summarize some of the above points and add some:

  • We started with pr(a,F).t-2 and no hypothesis at t-2.
  • At t-1 we introduced three hypotheses ps(a, F|T).t-1 and ps(a,T).t-1 and put ps(a, F).t-2 = ps(a, F|~T).t-1. The reason for this last hypothesis is that it is what we started with at t-2. We obtain a new calculated ps(a, F).t-1 = ps(a, F|T).t-1*ps(a,T).t-1 + ps(a, F).t-2*ps(a,~T).t-1 = ht+f~t in abbreviated notation. Since in abbreviated notation this means ht+f~t = ht+f(1-t) = t(h-f)+f we have some clues to how this differs from f or when it is the same.
  • At t we can calculate ps(a,T).t  using Bayesian Conditionalization, on the hypothesis that F is true at t. This uses the ps(a, F) at t-1 i.e. ht+f~t. But we also can calculate ps(a,~T).t  and then ps(a, F).t and ps(a, ~F).t. This is more complicated but it also is basic algebra, and requires no new data, and uses what was given or assumed at t-1.
  • At t we can also calculate ps(a,T).t  using Bayesian Conditionalization, on the hypothesis that F is false at t: ps(a, T|~F).t-1  = ps(a, ~F|T).t-1 *ps(a,T).t-1  : ps(~F).t-1  = ~ht : (~ht+~f~t). See Table 5.
  • And we can likewise calculate ps(a, ~T|F).t  and ps(a, ~T|~F).t, and the algebraical results can again be gleaned from Table 5. 

So at t one arrives at what amounts to the above table in general if one has explicitly calculated everything using Bayesian Conditionalization, but of course with specific fractions for specific cases.

One interesting fact about the last table is that it charts the alternatives of a hypothesis, and indeed the true real frequency of F, if any, which the alternative hypotheses attempt to catch, is not the marginal ht+f~t which sums both.

Next, a remark about the stability of the conditionals, which in the present reconstruction amounts to the stability of h i.e. ps(a, F|T).t-1 = ps(a, F|T).t and of f i.e. ps(a, F|~T).t-1 = ps(a, F|~T).t = ps(a, F).t-2 = pr(a, F).t-2. In fact, these conditionals should be stable intuitively, for they are hypotheses about reality: We may get rid of one of them - T or ~T - by the evidence, once we have found it.

It remains to consider Bayesian Conditionalization and degrees of belief.

First about degrees of belief.

Seen as degrees of belief, the new fractions calculated with Bayesian Conditionalization differ from real probabilities based on frequencies, which is what one may have started from at t-2: pr(F).t-2=f.

They are proportions like probabilities, and indeed degrees of belief may be fairly called personal or subjective probabilities. But they are not like ordinary frequency based probabilities, because they are hypothetical: One or the other of T and ~T is false, and the fractions in the cells in the column used for it are purely hypothetical. Indeed, at most one of T and ~T is true and so at least one of the columns for T and ~T is purely speculative and merely corresponds to one's degrees of belief, and not to any frequency one can establish directly.

And indeed this is unlike the case of a patient with a disease and symptoms mentioned in section 2, for which the four inner cells in the table in principle can all be established themselves, in that one may be able to find people with the disease with and without the symptoms and people without the disease with and without the symptoms, for all one has here are the symptoms, figuratively speaking, and two confilcting hypotheses to account for these facts.

In fact, the degrees of belief derive from the hypothesis T that was started at t-1 to account for F (or for another fact X that is relevant for F).

Second about Bayesian Conditionalization.

In the above last table, Bayesian Conditionalization corresponds to the move from  ps(a, T).t-1=(t) to ps(a, T).t=(ht):(ht+f~t). It is a recalculation of one's degree of belief in T upon finding that F.

The big question now is: Is this move probabilistic? Well - it surely is proportionalistic: ps(a,T).t = ps(a,T|F).t = ps(a, F|T),T).t-1:ps(a, F).t-1. But note that in fact at t ps(a,T|F).t = ps(a, F|T).t*ps(a,T).t:ps(a, F).t = ps(a, F|T),T).t since ps(a, F).t = 1. Therefore also, ps(a,T|F).t = ps(a,T).t - but then that is useless.

Note also that for Bayesian Conditionalization given that conditional degrees of belief remain the same in time we also have ps(a, F|T).t = ps(a, F|T).t-1 = (ht):t = h, as is correct.

So this differs from probability theory: The Bayesian Conditionalization step corresponds to recalculating at t using the numbers for t-1 and not for t as would happen in ordinary proportional algebra including ordinary probability theory.

The Bayesian Conditionalization move does follow given that conditional degrees of belief remain the same in time, and given that degrees of belief are proportions, and indeed then the previously useless step seems to become useful and seems to correspond to conditionalization.

Now this move is quite plausible for degrees of belief - for what else can one rationally use but one's last best hypothetical estimates? - but not for probabilities as real frequencies, since conditional frequencies may well change in time.

How plausible it is for degrees of belief can be illustrated by doing our earlier theorem for explicit beliefs, using five assumptions and logic:

The assumptions are:

A0. ps is proportional.
A1. aB( pr(X)=z ).t         IFF ps(a,X).t = z
A2. aB( Y |- pr(X)=z ).t   IFF ps(a,X|Y).t = z
A3. aB( Y |- pr(X)=z ).t-1 --> aB( Y |- p(X)=z ).t
A4. aB( X |- Y ).t --> aB( X).t |- aB( Y ).t
A5. aB( pr(X)=1 ).t IFF aB( X ).t  

A0 guarantees that personal probabilities a.k.a. degrees of belief are proportions, like ordinary probabilities.

A1 and A2 convert between beliefs in probabilities and personal probabilities, and guarantee that one's personal probability is numerically the same as one's belief in the probability. Note that A2 imposes a particular, natural and simple intepretation on conditional degree of belief: it is belief in a probability based on an assumption.

A3 insists that conditional beliefs remain constant in time once established or assumed, e.g. on the ground that what is involved is a relation of deducibility.

A4 says that if one believes that X implies Y at t then if one believes X at t one believes Y at t.

A5 converts between belief that X has probability 1 and the belief that X is true.

For beliefs in probabilities the theorem and argument now follow. The logic assumed apart from the assumptions made is standard First-Order Predicate Logic with identity.

T. aB( T |- pr(F)=h ).t-1 & aB( ~T |- pr(F)=f ).t-1 & aB( pr(T)=e ).t-1 &
    aB( pr(F)=1 ).t |- aB( pr(T)=(he):(he+f~e) ).t

This is the theorem to be proved, all in terms of beliefs about probabilities, all relativized to times.

AI           1.  aB( T |- pr(F)=h ).t-1                 
AI           2.  aB( ~T |- pr(F)=f ).t-1
AI           3.  aB( pr(T)=e ).t-1
AI           4.  aB( pr(F)=1 ).t

The assumptions of the proof. Notice that at (4) a new fact is recorded, that was not so at t-1.

1,A2        5.  ps(a,F|T).t-1=h
2, A2       6.  ps(a,F|~T).t-1=f
3, A1       7.  ps(a,T).t-1=e

The inference of the personal probabilities for a Bayesian Conditionalization.

A0          8.  ps(a,F).t-1 = ps(a,F&T).t-1+ps(a,F&~T).t-1
8,A0        9.                  = ps(a,F|T).t-1*ps(a,T).t-1 +
5,6,7,9    10.                 = (he + f~e)

The calculation of ps(a,F).t-1. It is A0 i.e. the assumption that personal probabilities are proportions, like ordinary probabilities, that allows these arguments, together with the assumptions of the proof at (10).

A0          11. ps(a,T|F).t-1 = ps(a,F|T).t-1 * ps(a,T).t-1 : ps(a,F).t-1
5,7,10     12.                    = he : (he + f~e)

The calculation of ps(a,T|F).t-1.

4,A1       13. ps(a,F).t = 1
A0          14. ps(a,T|F).t = ps(a,T&F).t : ps(a,F).t
14,A0      15.                 = ps(a,T).t

The new information at t simplifies ps(a,T|F).t to ps(a,T).t (a fact often slurred over in expositions about Bayesian Conditionalization).

A3,A2       16. ps(a,T|F).t-1   =  ps(a,T|F).t
15,16       17. ps(a,T).t = ps(a,T|F).t-1

Here the new personal probability at t for T has been derived, which was already calculated at (12) from the assumptions of the theorem. And in fact ps(a,T).t has been calculated from the latest relevant information that a had avalable, namely at t-1 listed by the assumptions (1)-(3).

12,A2      18. aB( F |- pr(T)=he : (he + f~e) ).t-1
18,A3      19. aB( F |- pr(T)=he : (he + f~e) ).t

Converting personal probabilities into a person's beliefs about probabilities.

4,A5       20. aB( F ).t

Converting a belief in a probability of 1 into belief of truth.

19,A4      21. aB( F ).t   |-   aB( pr(T)=he : (he + f~e) ).t

Converting belief in a conditional into a conditional between beliefs.

20,21,MP 22. aB( pr(T)=he : (he + f~e) ).t

Deriving the conclusion of the theorem, that was to be proved, by ordinary modus ponens: A new probability for T derived in conformity with Bayesian reasoning. QED.

So this seems a plausible new interpretation of and explanation for Bayesian Conditionalization: It concerns degrees of belief, that are proportions like probabilities, with conditional degrees of belief supposed constant in time, and corresponding to believed conditions for probabilities. And the reason for that constancy is that it concerns hypotheses, in which the degree of belief does not vary directly with time but varies with evidence.


See also: Bayes' Theorem, Probabilistic Rules of Reasoning

Literature: Ayer, Adams, Hume, Goodman, Howson & Urbach, Maartensz, Rescher, Russell, Stegmüller,

 Original: Aug 28, 2005                                                Last edited: 15 September 2007.   Top