Skip to content

4/9 Deep Sets

Last time, we recalled the general notions of group actions, and invariant and equivariant actions, moreover, we introduced the objective of Geometric Deep Learning: create model architectures that are invariant or equivariant to symmetries of the input feature space.

Today, we focus on the particular case of permutation action.

Deep Sets

Let \(\mathscr X_0\) denote either:

  1. a finite set \([n]\) or
  2. a set of \(n\)-dimensional vectors.

Note that via one-hot vectors, 1. is a special case of 2.

Let the feature space \(\mathscr X\) be the set \(L\mathscr X_0\) of finite sequences from \(\mathscr X_0\). For each nonnegative integer \(m\), we let \(L_m\mathscr X_0\subseteq L\mathscr X_0\) denote the subset of sequences of length \(m\).

Then the permutation group \(S_m\) acts on \(L_m\mathscr X_0\) by permuting element order.

Theorem 1 [1, Theorem 2, modified]. A map \(L_m\mathscr X_0\xrightarrow f\mathbf R^k\) is \(S_m\)-invariant if and only if it can be written in the form $$ f(A)=h\left(\sum_{i=0}^{m-1}g(\mathbf a_i)\right) $$ for some functions \(\mathscr X_0\xrightarrow g\mathbf R^\ell\) and \(\mathbf R^\ell\xrightarrow h\mathbf R^k\).

Remark 2. In case \(\mathscr X_0\cong[n]\), the map \(g\) is a collection \(\{\mathbf v_0,\dotsc,\mathbf v_{n-1}\}\) of \(\ell\)-dimensional vectors.

Remark 3. In Theorem 1, we can use mean or max instead of summation. In implementation, we will use mean as it is a normalized version of sum.

Definition 4. By the Universal Approximation Theorem, we can let \(g\) and \(h\) be DNNs. Such an architecture is called a a Deep Set. We call \(g\) the equivariant map or embedding, summation, max or mean the pooling map, and \(h\) the invariant map.

Remark 5. Somewhat earlier than [1], a similar architecture was studied in [2]. Their focus is on 3D point clouds.

References

[1] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Russ R. Salakhutdinov and Alexander J. Smola. Deep Sets. Advances in Neural Information Processing Systems 30 (NeurIPS 2017), link

[2] Charles R. Qi., Hao Su, Kaichun Mo and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 77-85, link