4/11 Deep Sets Review
Last time, we started discussing learning on sequences. More precisely, we denote by \(\mathscr X_0\) the elementwise feature space, which can be either:
- a discrete set: \(\mathscr X_0\cong[n]\), sometimes called a vocabulary, or
- a subset of a Euclidean space: \(\mathscr X_0\subseteq\mathbf R^d\).
Then the feature space is the set \(L\mathscr X_0\) of finite sequences of entries in \(\mathscr X_0\). We denote by \(L_m\mathscr X_0\subseteq L\mathscr X_0\) the subset of finite sequences of \(m\) elements.
The subset \(L_m\mathscr X_0\) has an action by the permutation group \(S_m\).
A Deep Set is a model architecture that is invariant to this permutation action. Given a sequence \(X=(\mathbf x_1,\dotsc,\mathbf x_m)\in L_m\mathscr X_0\), the image by the model is defined by
$$ f_\theta(X)=h_\xi\left(p\left(g_\phi(\mathbf x_1),\dotsc,g_\phi(\mathbf x_m)\right)\right), $$ where:
- The elementwise map \(g_\phi\) is:
- a collection of word vectors \(\{\mathbf v_i\in\mathbf R^\ell:i\in\mathscr X_0\}\), if the elementwise feature space \(\mathscr X_0\) is discrete, and
- an MLP \(\mathscr X_0\to\mathbf R^\ell\) if \(\mathscr X_0\) is continuous,
- The function \(\mathbf R^{\ell\times m}\xrightarrow p\mathbf R^\ell\) is a permutation-invariant pooling function, usually mean, max, min or sum, and
- The invariant map \(h_\xi\) is an MLP \(\mathbf R^\ell\to\mathbf R^k\).