4/9 Deep Sets

Last time, we recalled the general notions of group actions, and invariant and equivariant actions, moreover, we introduced the objective of Geometric Deep Learning: create model architectures that are invariant or equivariant to symmetries of the input feature space.

Today, we focus on the particular case of permutation action.

Deep Sets

Let $\mathscr X_0$ denote either:

a finite set $[n]$ or
a set of $n$-dimensional vectors.

Note that via one-hot vectors, 1. is a special case of 2.

Let the feature space $\mathscr X$ be the set $L\mathscr X_0$ of finite sequences from $\mathscr X_0$. For each nonnegative integer $m$, we let $L_m\mathscr X_0\subseteq L\mathscr X_0$ denote the subset of sequences of length $m$.

Then the permutation group $S_m$ acts on $L_m\mathscr X_0$ by permuting element order.

Theorem 1 [1, Theorem 2, modified]. A map $L_m\mathscr X_0\xrightarrow f\mathbf R^k$ is $S_m$-invariant if and only if it can be written in the form $$ f(A)=h\left(\sum_{i=0}^{m-1}g(\mathbf a_i)\right) $$ for some functions $\mathscr X_0\xrightarrow g\mathbf R^\ell$ and $\mathbf R^\ell\xrightarrow h\mathbf R^k$.

Remark 2. In case $\mathscr X_0\cong[n]$, the map $g$ is a collection $\{\mathbf v_0,\dotsc,\mathbf v_{n-1}\}$ of $\ell$-dimensional vectors.

Remark 3. In Theorem 1, we can use mean or max instead of summation. In implementation, we will use mean as it is a normalized version of sum.

Definition 4. By the Universal Approximation Theorem, we can let $g$ and $h$ be DNNs. Such an architecture is called a a Deep Set. We call $g$ the equivariant map or embedding, summation, max or mean the pooling map, and $h$ the invariant map.

Remark 5. Somewhat earlier than [1], a similar architecture was studied in [2]. Their focus is on 3D point clouds.

References

[1] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Russ R. Salakhutdinov and Alexander J. Smola. Deep Sets. Advances in Neural Information Processing Systems 30 (NeurIPS 2017), link

[2] Charles R. Qi., Hao Su, Kaichun Mo and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 77-85, link