I recently read The Miracle of the Boltzmann Machine, and it's so compelling that I've been thinking about it ever since. I intend to write much more on Boltzmann Machines in the future, but here I'm just going to show my work differentiating the objective function.
Given¶
- Objective function $$L(W) := \mathbb{E}_{D(V)} [log P(V)]$$
- and probability of a given BM state $X=(V,H)$ $$P(X) := P(V,H) := {e^{X^TWX/2}\over {\sum_{X'} e^{X'^TWX'/2}}}$$ $$P(V) := \sum_H P(V,H) = \frac{\sum_H e^{X^TWX/2}}{\sum_{X'} e^{X'^TWX'/2}}$$ where $W$ is the BM transition matrix, assuming $w_{ij}=w_{ji}$
Want to show¶
$$\frac{\partial L}{\partial w_{ij}} = \mathbb{E}_{D(V)P(H|V)}[x_ix_j]-\mathbb{E}_{P(V,H)}[x_ix_j]$$Proof¶
- Definition of expected value $$L(W)=\mathbb{E}_{D(V)} [\log P(V)] = \sum_V D(V)\log P(V)$$
- Let $f = logP(V)$ $$\frac{\partial L}{\partial f} = \sum_V D(V)\frac{\partial f}{\partial w_{ij}}$$
- Chain rule $$\frac{\partial f}{\partial w_{ij}} = {\frac{\partial P(V)}{\partial w_{ij}} \over P(V)}$$
- Expand $P(V)$ $$\frac{\partial P(V)}{\partial w_{ij}} = \frac{\partial}{\partial w_{ij}}\left[\sum_H P(V,H)\right] = \frac{\partial}{\partial w_{ij}}\left[\sum_H {e^{X^TWX/2}\over {\sum_{X'} e^{X'^TWX'/2}}}\right] = \sum_H \frac{\partial}{\partial w_{ij}}\left[{e^{X^TWX/2}\over {\sum_{X'} e^{X'^TWX'/2}}}\right]$$
- Quotient rule $$\frac{\partial P(V)}{\partial w_{ij}} =\sum_H \frac{\frac{\partial}{\partial w_{ij}}\left[e^{X^TWX/2}\right]{\sum_{X'} e^{X'^TWX'/2}}-e^{X^TWX/2} \frac{\partial}{\partial w_{ij}}\left[{\sum_{X'} e^{X'^TWX'/2}}\right]}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}$$
- Chain rule, and notice $\frac{\partial}{\partial w_{ij}}\left[W\right]$ is $0$ everywhere except $w_{ij}$, so $$\frac{\partial}{\partial w_{ij}}\left[e^{X^TWX/2}\right] = \frac{\partial}{\partial w_{ij}}\left[X^TWX/2\right] e^{X^TWX/2} = x_ix_je^{X^TWX/2}$$
- So #5 becomes $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H \frac{x_ix_je^{X^TWX/2}{\sum_{X'} e^{X'^TWX'/2}}-e^{X^TWX/2} \sum_{X'}x'_ix'_je^{X'^TWX'/2}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}$$
- Separating terms $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H\left[\frac{x_ix_je^{X^TWX/2}{\sum_{X'} e^{X'^TWX'/2}}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}\right]-\sum_H\left[\frac{e^{X^TWX/2} \sum_{X'}x'_ix'_je^{X'^TWX'/2}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}\right]$$
- Cancelling and moving factors outside sums $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H\left[\frac{x_ix_je^{X^TWX/2}}{{\sum_{X'} e^{X'^TWX'/2}}}\right]-\frac{\sum_H\left[e^{X^TWX/2}\right] \sum_{X'}x'_ix'_je^{X'^TWX'/2}}{\left({\sum_{X'} e^{X'^TWX'/2}}\right)^2}$$
- Definition of $P(V,H)$ and $P(V)$ $$\frac{\partial P(V)}{\partial w_{ij}} = \sum_H\left[x_ix_jP(V,H)\right]-P(V) \sum_{X'}\left[x'_ix'_jP(V',H')\right]$$
- Substituting #10 into #3 and #3 into #2 we have $$\frac{\partial L}{\partial w_{ij}} = \sum_VD(V)\left[\frac{\sum_H\left[x_ix_jP(V,H)\right]-P(V) \sum_{X'}\left[x'_ix'_jP(V',H')\right]}{P(V)}\right]$$
- Separating into two terms $$\frac{\partial L}{\partial w_{ij}} = \sum_V\left[D(V)\sum_H\left[\frac{x_ix_jP(V,H)}{P(V)}\right]\right]-\sum_V\left[D(V)P(V)\sum_{X'}\left[x'_ix'_jP(V',H')\right]\right]$$
- Definition of conditional probability $$\frac{\partial L}{\partial w_{ij}} = \sum_V\sum_H\left[x_ix_jD(V)P(H|V)\right]-\sum_VD(V)\sum_{X'}\left[x'_ix'_jP(V',H')\right]$$
- $\sum_VD(V)=1$, combining sums, and $X=(V,H)$ $$\frac{\partial L}{\partial w_{ij}} =\sum_{(V,H)}\left[x_ix_jD(V)P(H|V)\right]-\sum_{(V',H')}\left[x'_ix'_jP(V',H')\right]$$
- Definition of expected value $$\frac{\partial L}{\partial w_{ij}} = \mathbb{E}_{D(V)P(H|V)}[x_ix_j]-\mathbb{E}_{P(V,H)}[x_ix_j]$$ $\square$