*Bounty: 50*

*Bounty: 50*

I’ve been reading over this Multivariate Gaussian conditional proof, trying to make sense of *how* the mean and variance of a gaussian conditional was derived. I’ve come to accept that unless I allocate a dozen or so hours to refreshing my linear algebra knowledge, it’s out of my reach for the time being.

that being said, I’m looking for a conceptual explanation for that these equations represent:

$$mu_{1|2} = mu_1 + Sigma_{1,2} * Sigma^{-1}_{2,2}(x_2 – mu_2)$$

I read the first as "Take $mu1$ and augment it by some factor, which is the covariance scaled by the precision (measure of how closely $X_2$ is clustered about $mu_2$, maybe?) and projected onto the distance of the specific $x_2$ from $mu_2$."

$$Sigma_{1|2} = Sigma_{1,1} – Sigma_{1,2} * Sigma^{-1}_{2,2} * Sigma_{1,2}$$

I read the second as, "take the variance about $mu_1$ and subtract some factor, which is covariance squared scaled by the precision about $x_2$."

In either case, the precision $Sigma^{-1}_{2,2}$ seems to be playing a really important role.

A few questions:

- Am I right to treat precision as a measure of how closely observations are clustered about the expectation?
- Why is the covariance squared in the latter equation? (Is there a geometric interpretation?) So far, I’ve been treating $Sigma_{1,2} * Sigma^{-1}_{2,2}$ as a ratio, (a/b), and so this ratio acts to scale the (second) $Sigma_{1,2}$, essentially accounting for/damping the effect of the covariance; I don’t know if this is valid.
- Anything else you’d like to add/clarify?