Homework 11
Let's implement a MLP with permutation-invariant layers, as described in Lecture 0416.
-
First, write the following mean pool module. We will use it both inside the permutation equivariant layers, and as the final layer of our model. Note that it expects the sequential dimension of the feature tensor to be second to last. Note also that the mean pool operation that it should perform was already implemented in Notebooks 0411 and 0416.
class MeanPool(torch.nn.Module): """ This module performs mean pool of sequential data along the sequential dimension. Arguments --------- config : `dict` Configuration dictionary. Required key-value pair: `"ensemble_shape"` : `tuple[int]` The shape of the ensemble of models that process the data. Calling ------- The expected input is a dictionary of the following key-tensor pairs: `"features"` The feature tensor, of shape `(..., sequence_dim, feature_dim)` `"mask"` This mask signals which entry is not a padding element. It should have size `(..., sequence_dim)`. """ -
Now, you can implement the following, permutation-equivariant affine layer. We derived it in Lecture 0416: an affine transformation \(\mathbf R^{m\times d}\xrightarrow A\mathbf R^{m\times d'}\) that is equivariant with respect to permutations along the \(m\) dimension is of the form \(A(X)=L(X)+\mathbf b\) where
-
The permutation-equivariant linear map \(L\) is of the form \(L(X)=XW_1 + \frac{1}{m}(\boldsymbol1\boldsymbol1^T)XW_2\) with parameter matrices \(W_1,W_2\in\mathbf R^{d\times d'}\).
-
We have a bias vector \(\mathbf b\in\mathbf R^{d'}\).
Note that, as I write in point 1, to implement multiplication by \(\frac{1}{m}(\boldsymbol1\boldsymbol1^T)\), you can use
MeanPoolas defined above and broadcasting.class PermutationEquivariantLayer(torch.nn.Module): """ Permutation-equivariant affine transformation that maps inputs of shape `(sequence_dim, in_features)` to outputs of shape `(sequence_dim, out_features)` in a way that is equivariant with respect to permutations along the sequence dimension. Arguments --------- config : `dict` Configuration dictionary. Required key-value pairs: `"device"` : `str` The device to store parameters on. `"ensemble_shape"` : `tuple[int]` The shape of the ensemble of affine transformations the model represents. in_features : `int` The number of input features out_features : `int` The number of output features. bias : `bool`, optional Whether the model should include bias. Default: `True`. init_multiplier : `float`, optional The weight parameter values are initialized following a normal distribution with center 0 and std `in_features ** -.5` times this value. Default: `1.` Calling ------- The expected input is a dictionary of the following key-tensor pairs: `"features"` The feature tensor, of shape `(..., sequence_dim, feature_dim)` `"mask"` This mask signals which entry is not a padding element. It should have size `(..., sequence_dim)`. """ -
-
Now, just like the
get_mlpfactory function we wrote in Notebook 0319, you can write the following factory function to create permutation-invariant MLPs:def get_permutation_invariant_mlp( config: dict, in_features: int, out_features: int, hidden_layer_num: Optional[int] = None, hidden_layer_size: Optional[int] = None, hidden_layer_sizes: Optional[Iterable[int]] = None, ) -> torch.nn.Sequential: """ Creates a permutation-equivariant MLP that is a composite of permutation-equivariant affine layers and ReLU activation function with a mean pool layer at the end. Can create a model ensemble. config : `dict` Configuration dictionary. Required key-value pairs: `"device"` : `str` The device to store parameters on. `"ensemble_shape"` : `tuple[int]` The shape of the ensemble of affine transformations the model represents. in_features : `int` The number of input features out_features : `int` The number of output features. hidden_layer_num : `int`, optional If `hidden_layer_sizes` is not given, we create an MLP with `hidden_layer_num` hidden layers of `hidden_layer_size` dimensions. hidden_layer_size : `int`, optional If `hidden_layer_sizes` is not given, we create an MLP with `hidden_layer_num` hidden layers of `hidden_layer_size` dimensions. hidden_layer_sizes: `Iterable[int]`, optional If given, each entry gives a hidden layer with the given size. """ -
Finally, you can train your model on the convex hull area dataset that we used in Notebook 0416. For training, as the model as we write it here outputs batch dictionaries, you can use the function
train_supervisedas in Notebook 0423. Since that setup requires the feature tensors located at the"features"key of the batch dictionaries, you need to edit the functiongenerate_convex_hull_dataset, written in Notebook 0416 accordingly.Train a model ensemble on the convex hull area dataset, and then test the best ensemble entry on the test split, with plotting the true and predicted convex hull areas, as we did in Notebook 0416.