mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

3K
active users

Dan Piponi

Just a minor fact that doesn't seem to be stated in many places - maybe it's too obvious to state:

In the absence of knowledge of what a probability distribution is, people sometimes like to pick the maximum entropy distribution consistent with the knowledge they do have.

More generally you may have some prior idea of what the distribution is. In that case you might pick the distribution with the maximum entropy *relative* to your prior guess consistent with any new knowledge.

A special case is when you draw from a joint distribution on X and Y and observe X. The posterior distribution of Y is given by the conditional distribution P(Y|X). This is precisely the maximum entropy distribution relative to the original distribution with the constraint that Y is known.

So conditional probabilities are in fact maximum (relative) entropy distributions.

@dpiponi I'm trying to understand what you said about conditional distributions being having maximal entropy. First up, do you have a typo? It should be X that is known rather than Y.

Is the idea that you look at the joint distributions of X and Y where X is fixed with its known value? Then among all of them, the entropy relative to the original distribution is maximised by the posterior distribution?

@dpiponi @OscarCunningham Hmm but isn't this explanation predicated on X though? I'm still unclear on "with the constraint that Y is known."

Did you mean something like "given we have some (event related to) Y for which we are interested in the probability of", or something else?

@metarecursive @OscarCunningham Trying to get a moment at a proper keyboard but won't for a bit. Basically go by the handwritten math. B is a subset of A. P(X|X in B) is the max entropy distribution relative to P(X) subject to the constraint that X lies in B. You can say this in the language of joint distributions but I regret that now.