User:Benja/Average probability

If $$\Omega$$ is a finite set, then the maximum entropy probability distribution (informally, the "least informative" distribution) over this set is the one that assigns every element $$\omega \in \Omega$$ the probability $$p(\omega)=1/|\Omega|$$. The average probability&mdash;more precisely, the arithmetic mean of the probabilities&mdash;of any probability distribution over $$\Omega$$ is (obviously) also $$1/|\Omega|$$. We can, therefore, say that a probability distribution over $$\Omega$$ "favors" an outcome $$\omega$$ if $$p(\omega)>1/|\Omega|$$, and "disfavors" it if $$p(\omega)<1/|\Omega|$$; the maximum entropy distribution is the unique distribution that assigns the "average" probability to every $$\omega \in \Omega$$ and, therefore, does not favor or disfavor any outcome.

What is interesting about this is that we can think of a probability distribution as "redistributing" probability mass away from the outcomes it disfavors, and towards the outcomes it favors. Clearly, the amount of probability mass distributed towards the latter equals the mass distributed away from the former.


 * This allows us to define the concept of the inverse $$p^C$$ of a probability distribution $$p$$ as the distribution that moves exactly as much probability mass away from the outcomes favored by $$p$$ as $$p$$ moves towards them, and in the same proportion; and moves this mass towards the outcomes disfavored by $$p$$, in the same proportion. (The maximum entropy distribution is its own inverse.)


 * ...or is that wrong?


 * Consider $$\Omega={1,2,3}$$, $$p(1)=1$$, $$p(2)=p(3)=0$$. What would $$p^C$$ be? The idea is that $$p^C(1)=0$$, $$p^C(2)=p^C(3)=1/2$$. But the probability mass p redistributes away from $$\{2,3\}$$ towards $$\{1\}$$ is $$2/3$$; the probability mass $$p^C$$ distributes away from $$\{1\}$$ and towards $$\{2,3\}$$ is only $$1/3$$.