4/11 Deep Sets Review

Last time, we started discussing learning on sequences. More precisely, we denote by $\mathscr X_0$ the elementwise feature space, which can be either:

a discrete set: $\mathscr X_0\cong[n]$, sometimes called a vocabulary, or
a subset of a Euclidean space: $\mathscr X_0\subseteq\mathbf R^d$.

Then the feature space is the set $L\mathscr X_0$ of finite sequences of entries in $\mathscr X_0$. We denote by $L_m\mathscr X_0\subseteq L\mathscr X_0$ the subset of finite sequences of $m$ elements.

The subset $L_m\mathscr X_0$ has an action by the permutation group $S_m$.

A Deep Set is a model architecture that is invariant to this permutation action. Given a sequence $X=(\mathbf x_1,\dotsc,\mathbf x_m)\in L_m\mathscr X_0$, the image by the model is defined by

$$ f_\theta(X)=h_\xi\left(p\left(g_\phi(\mathbf x_1),\dotsc,g_\phi(\mathbf x_m)\right)\right), $$ where:

The elementwise map $g_\phi$ is:
1. a collection of word vectors $\{\mathbf v_i\in\mathbf R^\ell:i\in\mathscr X_0\}$, if the elementwise feature space $\mathscr X_0$ is discrete, and
2. an MLP $\mathscr X_0\to\mathbf R^\ell$ if $\mathscr X_0$ is continuous,
The function $\mathbf R^{\ell\times m}\xrightarrow p\mathbf R^\ell$ is a permutation-invariant pooling function, usually mean, max, min or sum, and
The invariant map $h_\xi$ is an MLP $\mathbf R^\ell\to\mathbf R^k$.