I recently read The Miracle of the Boltzmann Machine, and it's so compelling that I've been thinking about it ever since. I intend to write much more on Boltzmann Machines in the future, but here I'm just going to show my work differentiating the objective function.

Given

  1. Objective function $$L(W) := \mathbb{E}_{D(V)} [log P(V)]$$
  2. and probability of a given BM state $X=(V,H)$ $$P(X) := P(V,H) := {e^{X^TWX/2}\over {\sum_{X'} e^{X'^TWX'/2}}}$$ $$P(V) := \sum_H P(V,H) = \frac{\sum_H e^{X^TWX/2}}{\sum_{X'} e^{X'^TWX'/2}}$$ where $W$ is the BM transition matrix, assuming $w_{ij}=w_{ji}$

Want to show

$$\frac{\partial L}{\partial w_{ij}} = \mathbb{E}_{D(V)P(H|V)}[x_ix_j]-\mathbb{E}_{P(V,H)}[x_ix_j]$$

Proof

  1. Definition of expected value $$L(W)=\mathbb{E}_{D(V)} [\log P(V)] = \sum_V D(V)\log P(V)$$
  2. Let $f = logP(V)$ $$\frac{\partial L}{\partial f} = \sum_V D(V)\frac{\partial f}{\partial w_{ij}}$$
  3. Chain rule $$\frac{\partial f}{\partial w_{ij}} = {\frac{\partial P(V)}{\partial w_{ij}} \over P(V)}$$
  4. Expand $P(V)$ $$\frac{\partial P(V)}{\partial w_{ij}} = \frac{\partial}{\partial w_{ij}}\left[\sum_H P(V,H)\right] = \frac{\partial}{\partial w_{ij}}\left[\sum_H {e^{X^TWX/2}\over {\sum_{X'} e^{X'^TWX'/2}}}\right] = \sum_H \frac{\partial}{\partial w_{ij}}\left[{e^{X^TWX/2}\over {\sum_{X'} e^{X'^TWX'/2}}}\right]$$
  5. Quotient rule $$\frac{\partial P(V)}{\partial w_{ij}} =\sum_H \frac{\frac{\partial}{\partial w_{ij}}\left[e^{X^TWX/2}\right]{\sum_{X'} e^{X'^TWX'/2}}-e^{X^TWX/2} \frac{\partial}{\partial w_{ij}}\left[{\sum_{X'} e^{X'^TWX'/2}}\right]}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}$$
  6. Chain rule, and notice $\frac{\partial}{\partial w_{ij}}\left[W\right]$ is $0$ everywhere except $w_{ij}$, so $$\frac{\partial}{\partial w_{ij}}\left[e^{X^TWX/2}\right] = \frac{\partial}{\partial w_{ij}}\left[X^TWX/2\right] e^{X^TWX/2} = x_ix_je^{X^TWX/2}$$
  7. So #5 becomes $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H \frac{x_ix_je^{X^TWX/2}{\sum_{X'} e^{X'^TWX'/2}}-e^{X^TWX/2} \sum_{X'}x'_ix'_je^{X'^TWX'/2}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}$$
  8. Separating terms $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H\left[\frac{x_ix_je^{X^TWX/2}{\sum_{X'} e^{X'^TWX'/2}}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}\right]-\sum_H\left[\frac{e^{X^TWX/2} \sum_{X'}x'_ix'_je^{X'^TWX'/2}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}\right]$$
  9. Cancelling and moving factors outside sums $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H\left[\frac{x_ix_je^{X^TWX/2}}{{\sum_{X'} e^{X'^TWX'/2}}}\right]-\frac{\sum_H\left[e^{X^TWX/2}\right] \sum_{X'}x'_ix'_je^{X'^TWX'/2}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}$$
  10. Definition of $P(V,H)$ and $P(V)$ $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H\left[x_ix_jP(V,H)\right]-P(V) \sum_{X'}\left[x'_ix'_jP(V',H')\right]$$
  11. Substituting #10 into #3 and #3 into #2 we have $$\frac{\partial L}{\partial w_{ij}} = \sum_VD(V)\left[\frac{\sum_H\left[x_ix_jP(V,H)\right]-P(V) \sum_{X'}\left[x'_ix'_jP(V',H')\right]}{P(V)}\right]$$
  12. Separating into two terms $$\frac{\partial L}{\partial w_{ij}} = \sum_V\left[D(V)\sum_H\left[\frac{x_ix_jP(V,H)}{P(V)}\right]\right]-\sum_V\left[D(V)P(V)\sum_{X'}\left[x'_ix'_jP(V',H')\right]\right]$$
  13. Definition of conditional probability $$\frac{\partial L}{\partial w_{ij}} = \sum_V\sum_H\left[x_ix_jD(V)P(H|V)\right]-\sum_VD(V)\sum_{X'}\left[x'_ix'_jP(V',H')\right]$$
  14. $\sum_VD(V)=1$, combining sums, and $X=(V,H)$ $$\frac{\partial L}{\partial w_{ij}} =\sum_{(V,H)}\left[x_ix_jD(V)P(H|V)\right]-\sum_{(V',H')}\left[x'_ix'_jP(V',H')\right]$$
  15. Definition of expected value $$\frac{\partial L}{\partial w_{ij}} = \mathbb{E}_{D(V)P(H|V)}[x_ix_j]-\mathbb{E}_{P(V,H)}[x_ix_j]$$ $\square$