Bayesian Conditionalization:
The application of the following theorem of elementary probability
theory: p(TP)=p(PT)*p(T):p(P) to revise p(T) to p(TP) if one learns
that P is true.
There are quite a few conceptual
problems in using this theorem in this way, and the present lemma
articulates one approach to dissolve these problems. One keyidea is
the distinction between degrees of belief and probabilities,
that are both brought together on the common footing of proportions. Other assumptions
that relate proportions, probabilities and degrees of belief are made Section 1.
Sections:
1. Probabilities,
degrees of belief and proportions
2. Derivation of Bayesian Conditionalization for degrees
of belief
3. A simple example
4. The reason for this lemma
5. Alternative and further axioms
6. Putting it all together
1. Probabilities,
degrees of belief and proportions
I start with articulating a number of
assumptions about probabilities, degrees of belief and proportions.
Some knowledge about probabilities
and proportions is presupposed in
what follows, and can be gotten by way of the links. Likewise, I
presuppose some elementary knowledge of standard logic.
Ax.1: Probabilities are proportions.
Proportions are taken as ratios of
cardinal numbers of subsets in sets, and include conditional proportions as in probability
theory.
Ax.2: Degrees of belief are
proportions.
Therefore probabilities and degrees of
belief share the formalities and properties of proportions. Since
degrees of belief may alter with time they involve a reference to time,
and Ax.2 accordingly may be formalized thus:
(Ax.2) (t)(a)(X) ( ps(a,X)._{t}
e PROPORTION )
Here t is a temporal index, a names a
person, and X a belief of the person, and so ps(a,X).t is the degree of
belief of a in X at time t.
Ax.3: Degrees of belief follow beliefs
in probabilities.
In other words: A person's degree of
belief in X  if the person is rational  conforms to his belief about
the probability of X. The reason for this assumption is that one's
beliefs about the probabilities are what one believes and have a
degree. Writing 'aB(pr(X)=y)._{t}' for 'a believes at t that
the probability of X is y' Ax.3 may be formalized thus:
(Ax.3) (t)(a)(X)(y) ( ps(a,X)._{t}
= y IFF aB( pr(X)=y)._{t} )
Next, we have a similar assumption as
A.3 for conditional degrees of belief, but without assuming these are
derived from conditional probabilities:
Ax.4: Conditional degrees of belief
are beliefs in probabilities from hypotheses.
This is an explanation of what
conditional degrees of belief are that can be formalized thus:
(Ax.4) (t)(a)(X)(Y)(z)( ps(a,YX)._{t}
= z IFF aB(X  pr(Y)=z)._{t} )
One particular point about aB(X 
pr(Y)=z)._{t} is that what a believes and assumes about X
allows a at t to deduce that pr(Y)=z if X is true. This may involve
rather a lot of assumptions in X, but that is as may be, and is also
the place where these assumptions should be.
Ax.5: Beliefs in probabilities from
hypotheses once adopted remain adopted until revised.
This is a natural assumption about
beliefs in probabilities from hypotheses: Assumptions do not depend on
time but on oneself and such hypotheses as one has, and one retains
them until one revises them, and revises them when revising the
evidence. Ax.5 can be formalized thus:
(Ax.5) (t)(a)(X)(Y) ( aB(X  Y)._{t1}
 aB(X  Y)._{t} )
Note what I have not assumed:
That degrees of beliefs are the same as probabilities, and that what I have
assumed that goes quite a long way in that direction is: One can infer
degrees of belief given beliefs in probabilities. For this follows from
Ax.3, which can be seen as a version of David Lewis' so called
Principal Principle.
Apart from the notation, there are
alternatives for some of the above assumptions. I mention two cases.
Thus, one possible weakening of Ax.3 is
to assume that degrees of belief depend functionally on beliefs about
probability, e.g. as a linear function, that allows for qualifications
relating to the quality of the evidence one has. However, it seems
generally most sensible to use Ax.3 as stated for one's calculations,
and only after one has made them and has revised one's degrees of
belief to qualify the result with reference to the quality of the
evidence, if this is necessary. A general assumption that may enter
here is
Ax.3A: Degrees of belief follow beliefs
in probabilities and never exceed them.
The reason is that one's probabilities
state such guesses and such evidence as one has, and therefore are in
the nature of the best one can do for the moment, given what one
believes.
A possible additional assumption in Ax.5
is that also ~(aB~(X  Y))._{t}. The revised axiom accordingly
may be written as
(Ax.5A) (t)(a)(X)(Y) ( aB(X  Y)._{t1}
& ~(aB~(X  Y))._{t}.  aB(X  Y)._{t} )
This may be taken as saying that if a at
t1 believes that X entails Y and at t a does believe that Y does not
follow from X then at t a believes that X entails Y. Thus the
hypothesis at t affirms that one has not revised one's estimate at t1.
In conclusion one has that one believes at t what one believed at t1.
2. Derivation of
Bayesian Conditionalization for degrees of belief
Given these assumptions, the following
theorem can be proved, where I avoid universal quantifiers and use free
variables instead (that allow the inference of universal quantifiers):
T1. ps(a,E)._{t}=1 &
ps(a,EC)._{t1} = x & ps(a,E)._{t1} = y &
ps(a,C)._{t1}=z
 ps(a,C)._{t}=ps(a,CE)._{t1}=
x*z:y
This asserts Bayesian Conditionalization for degrees of belief follows
from the usual hypotheses involved in Bayesian conditionals. Note there
is an explicit reference to time, and the new evidence at t is that
ps(a,E).
_{t}=1. The conclusion is a new
probability for C at t that differs from the old one at t1 unless x=y.
One reason for this introduction of
temporal indexes is that it makes a lot of intuitive sense when
speaking of degrees of belief. Another reason is given in the next
section.
Here is the proof of T1 with some
comments:
(1) ps(a,E).
_{t}=1 by
AI
(2) ps(a,EC)._{t1} =
x
by AI
(3) ps(a,E)._{t1} =
y
by AI
(4) ps(a,C)._{t1}=z
by AI
This lists all assumptions of the theorem to be proved.
(5) ps(a,CE).
_{t1} = ps(a,EC)._{t1} *
ps(a,C)._{t1} : ps(a,E)._{t1}
by PT, A14
(6) ps(a,CE)._{t1} = x*z :
y
by 2,3,4,7
Note that (5) follows from A14: Its
lefthand side equals its righthand side in probability theory and
therefore both sides are the same as degrees of belief, since ratios of
the same quantities are the same, and degrees of belief follow
probabilities. And then (6) follows from the assumptions.
(7) ps(a,CE).
_{t1} = ps(a,CE)._{t}
by 24, A5
This follows from A5 since all
assumptions to derive ps(a, CE)._{t1} have been made. Note
that this line is a crucial step for the proof.
(8)
ps(a,CE)._{t}
= ps(a,C&E)._{t} : ps(a,E)._{t}
by A23
This follows since one's degrees of
beliefs are proportions that follow one's beliefs in probabilities:
Both terms on the right hand side are equal to their corresponding
probabilities, and therefore their quotient equals the conditional
degree of belief on the left hand side.
(9) ps(a,CE).
_{t} = ps(a,C)._{t}
by 1,8
And this follows as ps(a,E)._{t}=1,
whence ps(a,C&E)._{t} = ps(a,C)._{t} ,
and so we have the desired conclusion
(10) ps(a,C).
_{t} = x*z :
y
by 6,7,9
QED. But I can say and prove more about conditionalizing:
(11) aB(pr(E)=1)._{t}
by 1, A3
(12) aB( C  pr(E)=x )._{t1}
by 2, A4
Here degrees of beliefs and beliefs in
probabilities are interchanged. Now, taking a few things for granted
here that anyway make intuitive sense concerning inferences with
propositions of the form 'aB(X)._{t}' and 'aB(pr(X)=y)._{t}'
(13) aB(E)._{t}
by 11
This follows from aB(pr(E)=1)._{t}
. Since there also is ps(a,CE)._{t} =
x*z : y we have
(14) aB(E  pr(C)=x*z : y ).
_{t} by
9,10, A4
and now one can infer a revisal of one's
degree of belief in the probability of C by ordinary Modus Ponens,
that derives aB(Y)._{t} from aB(X)._{t} & aB(X 
Y)._{t }:
(15) aB(pr(C)=x*z : y ).
_{t}
by 13, 14
And thus Bayesian Conditionalization may
be related to and explained in terms of ordinary conditionalization,
given the above assumptions. For clearly what we can also prove is the
counterpart of T1 in terms of beliefs:
T2. aB(pr(E)=1)._{t} &
aB(C  pr(E)= x)._{t} & aB(pr(E)=y)._{t1}
& aB(pr(C)=z)._{t1}
 aB(pr(C)=x*z:y)._{t1}
The proof of this can be gleaned from
the foregoing proof, and is simply a matter of using the assumptions
that have been made that relate degrees of beliefs and probabilities.
It may be objected here that the
reasoning with propositional attitudes has not been clarified, but all
that is required here are assumptions to the following effect:
Ax.6 aB(pr(X)=1)._{t} 
aB(X)._{t}
Ax.7 aB(X  Y)._{t}  (aB(X)._{t}
 aB(Y)._{t})
These are sufficient for the above
inferences (13)(15) and are intuitively obvious and unobjectionable.
3. A simple example
It may be good to give a schematic
example of the sort of reasoning outlined above. Suppose there is a
disease C which is not very common, which usually but not always comes
with symptoms E, that without the disease are rare. Suppose then that
the probabilities are as follows  where I have written everything to
make sense also as percentages, since that is intuitively helpful:
Table
1


C 
~C 

E 
9 
1 
10 
~E 
1 
89 
90 

10 
90 
100 
We need not concern ourselves here with
how the probabilities were precisely established, but what does matter
is that C and E and their complements refer to classes of cases, such
as numbers of incidence of the disease and the symptoms.
It follows from the above assumptions
that one's degrees of belief are numerically the same as the
probabilities one believes. Accordingly, one's degree of belief that
someone has C is 1/10. Note that this does not concern a class
of cases, but the application of a known probability to a particular
case, with a degree of belief in return that is numerically the same as
a probability, on the strength of the assumptions we made about the
relations between proportions, probabilities and degrees of belief.
Now suppose that one finds that this
particular person does show the symptoms E. Then one's degree of belief
that the person has the disease C by the above reasoning changes from
1/10 to 9/10.
Note that nothing changes in the
probabilities: They remain just as they were, and indeed may be used
again for other cases of possible C of other persons. There is in the
present approach no revised probability: There only is a revised degree
of belief given new evidence. If there are priors and posteriors they
are not in probability but in degrees of belief, and indeed in section 2 the prior corresponds to ps(a,C)._{t1 }=
z and the posterior to ps(a,C)._{t} = x*z :
y.
4. The reason for
this lemma
The main reason for the lemma on
Bayesian Conditionalization derives from two beliefs I have
 (A) The principle of
Bayesian Conditionalization is the best approach towards a logic of
scientific inference: Only something like it explains how we can learn
from the evidence and from experience and how we can revise our degrees
of belief systematically and rationally in the light of such evidence
as we have.
 (B) The usual accounts
of Bayesian Conditionalization are for various reasons mistaken: In
particular, what conditional probabilities are is not clearly
articulated in standard approaches; the lack of reference in the
standard account to what evidence one has at what time is confusing;
and indeed the standard account confuses degrees of belief and
probabilities systematically.
There is a lot of literature on the
topic. Useful texts for (A) are
Howson & Urbach and Adams.
Useful texts for (B) are the same plus
Stegmüller and Lewis. The last concerns 'A Subjectivist's Guide to
Objective Chance' in
Jeffrey Ed.
What is new in the present proposal are
the axiomatic assumptions in Section 1 and the proof
in Section 2. Apart from what was said in Section 1
about each assumption, one basic conviction that motivates all
assumptions is that degrees of belief are not probabilities,
but must conform them to them if one is rational and one's belief in
the probabilities is rational. And the reason degrees of belief look
like probabilities and behave like them is that both degrees of belief
and probabilities are proportions.
But in the present approach, the
proportion that the probabilities measure and express concern the
real facts of the matter, whereas the proportion that the
degrees of belief measure and express concern the beliefs of a
person about the application of his beliefs about the probabilities
to some specific case.
5. Alternative and
further axioms
The axioms in section 1
seem intuitive, but here is an alternative set, that also incorporates
the distinction between theoretical
and empirical propositions, and an explicit
reference to presumed background knowledge K. It
will be assumed that ps(a,K)._{t } =
1, but clearly one may have different background knowledge at different
times.
In order to write one of the axioms in
a fairly clear way we also need a definition, namely the definition of proper
consequence of K&X, which we write as 'K&X < Y' and
define as follows:
aB(K&X < Y)._{t}
=def aB( (K&X  Y) & ~(K&~X  Y))._{t}
Thus, Y is proper consequence of
K&X at t if it follows at t from K&X but not from K&~X.
Now the axioms for degrees of belief
are these
Ax1: ps(a, XK)._{t} =y
& X e EMP IFF aB( K  p(X)=y)._{t}
Ax2: ps(a, XK)._{t} =y & X e THE IFF
aB( (EY)(Ez)( (K&X < p(Y) = z & y=z)._{t} &
aB( ~(EZ)(Ez)( (K&X < p(Z) = z & y=z))._{t}
Ax3: ps(a, YX&K)._{t }=z IFF
aB( K&X  p(Y)=z )._{t}
Ax4: ps(a, X&YK)._{t}
= ps(a, YX&K)._{t} * ps(a, XK)._{t}
Ax5: ps(a, XK)._{t}
= ps(a, X&YK)._{t} + ps(a, X&~YK)._{t
}
Ax6: ps(a, YX&K)._{t1} = ps(a,
YX&K)._{t}
Ax1 can be seen as defining when one's degree of belief in X at t given
background knowledge K equals y in case X is an
empirical proposition: Precisely if one believes that the
probability of X at t is y given K.
Here one may rely on frequencies and
sampling for one's beliefs in probabilities, for the proposition X is
supposed to be empirical.
Ax2 can be seen as
defining when one's degree of belief in X at t given background
knowledge K equals y in case X is a theoretical proposition. It is formulated as it is to insist that
the Y and Z that are used are proper consequences of K&X.
On this understanding
one's degree of belief in theory X at t
given background knowledge K equals y
precisely if
one believes that there is a proposition Y at t that is a proper
consequence of K&X that has probability y given K&X and one
believes there is not a proposition Z at t that is a proper consequence
of K&X that has a probability z such that z is smaller than y.
In brief: One's degree of belief in a
theoretical proposition X
at t given K equals the probability of the least probable proper
consequence of X given K that one believes at t.
Ax3 can be seen as
defining when one's degree of belief in Y at t given K and X is z:
Precisely if one believes that the probability of Y at t is z given K
and X.
These three axioms
accordingly generate degrees of belief from beliefs in
probabilities. They all are relative to time, which is best
taken as some sort of interval, like 'today' or 'this hour that I am
thinking about this problem' (and not an infinitesimally small now): At
a later time, one may know more or believe less or differently.
Ax4 can be seen as
defining one's degree of belief in X&Y at t given K: This equals
the product of one's degree of belief in Y at t given X&K and one's
degree of belief in X at t given K.
The degrees of
belief on the right side of Ax4 can be obtained by way of Ax1A3.
Ax5 can be seen as defining one's degree
of belief in X
at t given K in general, whether X is theoretical or empirical:
This equals the sum of one's degree of belief in X&Y at t given K and one's degree of belief in
X&~Y
at t given K for any Y.
The degrees of
belief on the right side of Ax5 can be obtained by way of Ax1A4.
Ax6 can be seen as imposing a
consistencycondition on conditional degrees of belief in Y given
K&X in time: These conditional probabilities must be the same at t
as at t1.
Note that by Ax3 what Ax6 says is this
aB( K&X  p(Y)=z )._{t} =
aB( K&X  p(Y)=z )._{t1}
and thus one plausible ground for Ax6 is
that deductions are valid or not irrespective of time: They do not
depend on time but on logic and assumptions.
Now all one needs in the present terms
and notations for recalculating one's degrees of belief given one's
beliefs in probabilities are these
(*) aB( K  p(T) ) = t )._{t1}.
aB( K&T  p(F) = h )._{t1}.
aB( K&~T  p(F) = g )._{t1}.
together with either of aB( K  p(F ) = 1 )._{t }or aB( K  p(F ) = 0 )._{t } for the given axioms allow one to calculate respectively
ps(a, T  F&K )._{t}
= (h*t) : (h*t + g*(1t)) if one believes F is true
ps(a, T  ~F&K )._{t} = ((1h)*t) : ((1h)*t +
(1g)*(1t)) if one believes ~F is true.
And of course both calculations are
correct whatever one believes about F, but one can believe  logically
speaking  at most one of two contradictory alternatives.
The algebra
required for these calculations can be gleaned from the following
table, in which ps(a,K)._{t } = 1. The degrees of
belief that are presupposed in (*), whence they follow by the axioms
given in this section, are coloured red, and all others can be derived
from these:
Table
2

K 
T 
~T 

F 
ps(a, FT&K)._{t}*ps(a,T&K)._{t} 
ps(a, F~T&K)._{t}*ps(a,~T&K)._{t} 
ps(a,F&K)._{t} 
~F 
ps(a,
~FT&K)._{t}*ps(a,T&K)._{t} 
ps(a,
~F~T&K)._{t}*ps(a,~T&K)._{t} 
ps(a,~F&K)._{t} 

ps(a,T&K)._{t} 
ps(a,~T&K)._{t} 
1 
And to conclude, here are the patterns
of inference that the present note provides axioms for.
First, in case
p(F) = 1 at t:
aB( K  p(T) ) = t )._{t1}.
aB( K&T  p(F) = h )._{t1}.
aB( K&~T  p(F) = g )._{t1}.
aB( K  p(F) = 1 )._{t}.

aB( K  p(T) = (h*t) : (h*t + g*(1t)) )._{t}.
And in case p(F) = 0 at t:
aB( K  p(T) ) = t )._{t1}.
aB( K&T  p(F) = h )._{t1}.
aB( K&~T  p(F) = g )._{t1}.
aB( K  p(F) = 0 )._{t}.

aB( K  p(T) = ((1h)*t) : ((1h)*t +
(1g)*(1t)))._{t}.
6.
Putting it all together again
The previous section involved background
knowledge K because this is realistic and generally present, but
algebraically it makes no difference and can be left out without any
difference in calculated values, and that is the plan we follow in the
present section where we put the bits together.
What generally happens when using
Bayesian Conditionalization with theories and verified or falsified
predictions can be summarized in tabular form as follows.
First, at time t2 all we
have is a real or possible fact F with some probability we believe. In
order to put it all in tabular form we start here with this:
Table
3  at time t2

aB 



F 


pr(a,F)._{t2} 
~F 


pr(a,~F)._{t2} 



1 
It makes sense to remark that the times
we shall refer to are t2, t1 and t, and these are best conceived as
intervals or periods in which we consider our facts and hypotheses, and
try to arrive at some new conclusions.
Second, having the mere possible fact F
with a believed probability at t2 we introduce a
new theory T at t1 and so we have ps(F~T)._{t1} as
corresponding to the originals for F: ps(a, F~T)._{t1 } =
ps(a,F)._{t2} . Hence in general ps(a,F)._{t1} will
differ from ps(a,F)._{t2}. What we get at t1 before finding
out about F is accordingly this, where our assumptions have been made
red:
Table
4  possibilities at t1

K 
T._{t1} 
~T._{t1} 

F 
ps(a, FT)._{t1}*ps(a,T)._{t1} 
ps(a, F)._{t2}*ps(a,~T)._{t1} 
ps(a,F)._{t1} 
~F 
ps(a,
~FT)._{t1}*ps(a,T)._{t1} 
ps(a,
~F)._{t2}*ps(a,~T)._{t1} 
ps(a,~F)._{t1} 

ps(a,T)._{t1} 
ps(a,~T)._{t} 
1 
Here it makes sense to introduce some
abbreviatory notation:
ps(a,T)._{t1} = t
ps(a, FT)._{t1 }
= h
ps(a, F~T)._{t1 } =
ps(a,F)._{t2}
= f
We can calculate all degrees of belief
in the last table from these, based on the assumption that indeed they
are proportions, like probabilities, and thus the same algebra applies.
Putting in these abbreviations, using '~x' for '1x' we get at t1 the
following, with juxtaposition for multiplication:
Table
5  fractions at t1

K 
T._{t1} 
~T._{t1} 

F 
(ht) 
(f~t) 
(ht+f~t) 
~F 
(~ht) 
(~f~t) 
(~ht+~f~t) 

(t) 
(~t) 
1 
Third, at t we find that
F, and we recalculate both ps(a, T)._{t} and ps(a, F)._{t}
using Bayesian Conditionalization i.e. ps(a,T)._{t }=
ps(a, FT)._{t1}*ps(a,T)._{t1 } : ps(a,F)._{t1 }for
T and similarly in the other cases  and note the temporal indexes. The
result in terms of our abbreviations looks as follows, if we work all
possibilities out by algebra and Bayesian Conditionalization:
Table
6  fractions at t after Bayesian Conditionalization

K 
T._{t} 
~T._{t} 

F 
h(ht):(ht+f~t) 
(ht+f~t)(f~t):(ht+f~t) 
h(ht)+(f~t)(ht+f~t):(ht+f~t) 
~F 
~h(ht):(ht+f~t) 
(~ht+~f~t)(f~t):(ht+f~t) 
~h(ht)+(f~t)(~ht+~f~t):(ht+f~t) 

(ht):(ht+f~t) 
(f~t):(ht+f~t) 
1 
There is a new degree of belief for T at
t, namely (ht):(ht+f~t), and also a new degree of
belief for ~T at t and for F at t.
Note first, also with
respect to the example in
section 2, that there is a difference
between the case (A) of applying statistics about a disease and
symptoms to a patient and (B) of testing a theory with a prediction.
For (A) you may have empirical probabilities for all cases, but not for
(B). Also, in case of (A) you can plausibly say that you apply
probabilities based on classes of cases of patients to a particular
patient. The new T and F are then applicable to that patient. But in
case of (B) this mode of proceeding is not so plausible.
It is the second case I
am really concerned with, in principle. Here are three points about it,
that summarize some of the above points and add some:
 We started with
pr(a,F)._{t2 }and no hypothesis at t2.
 At t1 we introduced
three hypotheses ps(a, FT)._{t1} and ps(a,T)._{t1}
and put ps(a, F)._{t2 }= ps(a, F~T)._{t1}. The
reason for this last hypothesis is that it is what we started with at
t2. We obtain a new calculated ps(a, F)._{t1 }= ps(a, FT)._{t1}*ps(a,T)._{t1
} + ps(a, F)._{t2}*ps(a,~T)._{t1 }
= ht+f~t in abbreviated notation. Since in abbreviated notation this
means ht+f~t = ht+f(1t) = t(hf)+f we have some clues to how this
differs from f or when it is the same.
 At t we can calculate
ps(a,T)._{t }
using Bayesian
Conditionalization, on the hypothesis that F is true at t. This
uses the ps(a, F) at t1 i.e. ht+f~t. But we also can calculate
ps(a,~T)._{t } and then ps(a, F)._{t} and ps(a, ~F)._{t}. This is more complicated
but it also is basic algebra, and requires no new data, and uses what
was given or assumed at t1.
 At t we can also
calculate ps(a,T)._{t }
using Bayesian
Conditionalization, on the hypothesis that F is false at t:
ps(a, T~F)._{t1 } = ps(a, ~FT)._{t1 }
*ps(a,T)._{t1 } : ps(~F)._{t1 } = ~ht :
(~ht+~f~t). See Table 5.
 And we can likewise
calculate ps(a, ~TF)._{t } and ps(a, ~T~F)._{t},
and the algebraical results can again be gleaned from Table 5.
So at t one arrives at
what amounts to the above table in general if one has explicitly
calculated everything using Bayesian Conditionalization, but of course
with specific fractions for specific cases.
One interesting fact
about the last table is that it charts the alternatives of a
hypothesis, and indeed the true real frequency of F, if any, which
the alternative hypotheses attempt to catch, is
not the marginal ht+f~t which sums both.
Next, a remark about the
stability of the conditionals, which in the present reconstruction
amounts to the stability of h i.e. ps(a, FT)._{t1} = ps(a,
FT)._{t} and of f i.e. ps(a, F~T)._{t1} = ps(a,
F~T)._{t} = ps(a, F)._{t2} = pr(a, F)._{t2}.
In fact, these conditionals should be stable intuitively, for they are hypotheses
about reality: We may get rid of one of them  T or ~T
 by the evidence, once we have found it.
It remains to consider
Bayesian Conditionalization and degrees of belief.
First about degrees of
belief.
Seen as degrees of
belief, the new fractions calculated with Bayesian
Conditionalization differ from real probabilities based on
frequencies, which is what one may have started from at t2: pr(F)._{t2}=f.
They are proportions like
probabilities, and indeed degrees of belief may be fairly
called personal or subjective probabilities. But they are not
like ordinary frequency based probabilities, because they are hypothetical:
One or the other of T and ~T is false, and the fractions in the cells
in the column used for it are purely hypothetical. Indeed, at most
one of T and ~T is true and so at least one of the columns for T and ~T
is purely speculative and merely corresponds to one's degrees of
belief, and not to any frequency one can establish directly.
And indeed this is unlike
the case of a patient with a disease and symptoms mentioned in section 2, for which the four inner cells in the table
in principle can all be established themselves, in that one may be able
to find people with the disease with and without the symptoms and
people without the disease with and without the symptoms, for all one
has here are the symptoms, figuratively speaking, and two confilcting
hypotheses to account for these facts.
In fact, the
degrees of belief derive from the hypothesis T that was started at t1
to account for F (or for another fact X that is relevant for F).
Second about Bayesian
Conditionalization.
In the above last table,
Bayesian Conditionalization corresponds to the move from ps(a, T)._{t1}=(t)
to ps(a, T)._{t}=(ht):(ht+f~t).
It is a recalculation of one's degree of belief in T upon finding that
F.
The big question now is:
Is this move
probabilistic? Well  it surely is
proportionalistic: ps(a,T)._{t} = ps(a,TF)._{t}
= ps(a, FT)._{t1}.ps(a,T)._{t1}:ps(a, F)._{t1}.
But note that in fact at t ps(a,TF)._{t} = ps(a, FT)._{t}*ps(a,T)._{t}:ps(a,
F)._{t} = ps(a, FT)._{t1}.ps(a,T)._{t }since
ps(a, F)._{t} = 1. Therefore also, ps(a,TF)._{t} =
ps(a,T)._{t}  but then that is useless.
Note also that for
Bayesian Conditionalization given that conditional degrees of belief
remain the same in time we also have ps(a, FT)._{t} = ps(a,
FT)._{t1} = (ht):t = h, as is correct.
So this differs
from probability theory: The Bayesian Conditionalization step
corresponds to recalculating at t using the numbers for t1
and not for t as would happen in ordinary proportional algebra
including ordinary probability theory.
The Bayesian
Conditionalization move does follow given that conditional degrees of
belief remain the same in time, and given that degrees of belief are
proportions, and indeed then the previously useless step seems to
become useful and seems to correspond to conditionalization.
Now this move is quite
plausible for degrees of belief  for what else can one
rationally use but one's last best hypothetical estimates?  but not
for probabilities as real frequencies, since conditional frequencies
may well change in time.
How plausible it is for degrees of
belief can be illustrated by doing our earlier theorem for explicit
beliefs, using five assumptions and logic:
The assumptions are:
A0. ps is proportional.
A1. aB( pr(X)=z )._{t}
IFF ps(a,X)._{t} = z
A2. aB( Y  pr(X)=z )._{t} IFF ps(a,XY)._{t}
= z
A3. aB( Y  pr(X)=z )._{t1} > aB( Y  p(X)=z )._{t}
A4. aB( X  Y )._{t} > aB( X)._{t}  aB( Y )._{t}
A5. aB( pr(X)=1 )._{t} IFF aB( X )._{t}
A0 guarantees that personal
probabilities a.k.a. degrees of belief are proportions, like ordinary
probabilities.
A1 and A2 convert between beliefs in
probabilities and personal probabilities, and guarantee that one's
personal probability is numerically the same as one's belief in the
probability. Note that A2 imposes a particular, natural and simple
intepretation on conditional degree of belief: it is belief in a
probability based on an assumption.
A3 insists that conditional beliefs
remain constant in time once established or assumed, e.g. on the ground
that what is involved is a relation of deducibility.
A4 says that if one believes that X
implies Y at t then if one believes X at t one believes Y at t.
A5 converts between belief that X has
probability 1 and the belief that X is true.
For beliefs in probabilities the theorem
and argument now follow. The logic assumed apart from the assumptions
made is standard FirstOrder Predicate Logic with identity.
T. aB( T  pr(F)=h )._{t1}
& aB( ~T  pr(F)=f )._{t1} & aB( pr(T)=e )._{t1}
&
aB( pr(F)=1 )._{t}  aB(
pr(T)=(he):(he+f~e) )._{t}
This is the theorem to be
proved, all in terms of beliefs about probabilities, all relativized to
times.
AI 1.
aB( T  pr(F)=h )._{t1}
AI 2.
aB( ~T  pr(F)=f )._{t1}
AI 3.
aB( pr(T)=e )._{t1}
AI 4.
aB( pr(F)=1 )._{t}
The assumptions of the proof. Notice
that at (4) a new fact is recorded, that was not so at t1.
1,A2 5. ps(a,FT)._{t1}=h
2, A2 6. ps(a,F~T)._{t1}=f
3, A1 7. ps(a,T)._{t1}=e
The inference of
the personal probabilities for a Bayesian Conditionalization.
A0 8.
ps(a,F)._{t1} = ps(a,F&T)._{t1}+ps(a,F&~T)._{t1}
8,A0
9.
= ps(a,FT)._{t1}*ps(a,T)._{t1} +
ps(a,F~T)._{t1}*ps(a,~T)._{t1}
5,6,7,9
10.
= (he + f~e)
The calculation of ps(a,F)._{t1}.
It is A0 i.e. the assumption that personal probabilities are
proportions, like
ordinary probabilities, that allows these arguments, together with the
assumptions of the proof at (10).
A0 11. ps(a,TF)._{t1}
= ps(a,FT)._{t1} * ps(a,T)._{t1} : ps(a,F)._{t1}
5,7,10
12.
= he : (he + f~e)
The calculation of ps(a,TF)._{t1}.
4,A1 13. ps(a,F)._{t} = 1
A0 14. ps(a,TF)._{t}
= ps(a,T&F)._{t} : ps(a,F)._{t}
14,A0
15.
= ps(a,T)._{t}
The new information at t simplifies ps(a,TF)._{t} to ps(a,T)._{t} (a fact
often slurred over in expositions about Bayesian Conditionalization).
A3,A2 16. ps(a,TF)._{t1}
= ps(a,TF)._{t}
15,16 17. ps(a,T)._{t} =
ps(a,TF)._{t1}
Here the new personal probability at t for T has been derived, which was
already calculated at (12) from the assumptions of the theorem. And in
fact ps(a,T)._{t} has been calculated from the latest relevant
information that a had avalable, namely at t1 listed by the
assumptions (1)(3).
12,A2 18. aB( F  pr(T)=he : (he + f~e)
)._{t1}
18,A3 19. aB( F  pr(T)=he : (he + f~e)
)._{t}
Converting personal
probabilities into a person's beliefs about probabilities.
4,A5 20. aB( F )._{t}
Converting a belief
in a probability of 1 into belief of truth.
19,A4 21. aB( F )._{t}
 aB( pr(T)=he : (he + f~e) )._{t}
Converting belief in a conditional into
a conditional between beliefs.
20,21,MP 22. aB( pr(T)=he : (he + f~e) )._{t
}Deriving the conclusion of the theorem, that was to
be proved, by ordinary modus ponens: A new probability for T derived in
conformity with Bayesian reasoning. QED.
So this seems a plausible
new interpretation of and explanation for Bayesian
Conditionalization: It concerns degrees of belief, that are proportions
like probabilities, with conditional degrees of belief supposed
constant in time, and corresponding to believed conditions for
probabilities. And the reason for that constancy is that it concerns hypotheses,
in which the degree of belief does not vary directly with time but
varies with evidence.
