Homework 11

Let's implement a MLP with permutation-invariant layers, as described in Lecture 0416.

  1. First, write the following mean pool module. We will use it both inside the permutation equivariant layers, and as the final layer of our model. Note that it expects the sequential dimension of the feature tensor to be second to last. Note also that the mean pool operation that it should perform was already implemented in Notebooks 0411 and 0416.

    class MeanPool(torch.nn.Module):
        """
        This module performs mean pool of sequential data
        along the sequential dimension.
    
        Arguments
        ---------
        config : `dict`
            Configuration dictionary. Required key-value pair:
            `"ensemble_shape"` : `tuple[int]`
                The shape of the ensemble of models
                that process the data.
    
        Calling
        -------
        The expected input is a dictionary of the following
        key-tensor pairs:
        `"features"`
            The feature tensor, of shape
            `(..., sequence_dim, feature_dim)`
        `"mask"`
            This mask signals which entry is not a padding element.
            It should have size `(..., sequence_dim)`.
        """
    
  2. Now, you can implement the following, permutation-equivariant affine layer. We derived it in Lecture 0416: an affine transformation \(\mathbf R^{m\times d}\xrightarrow A\mathbf R^{m\times d'}\) that is equivariant with respect to permutations along the \(m\) dimension is of the form \(A(X)=L(X)+\mathbf b\) where

    1. The permutation-equivariant linear map \(L\) is of the form \(L(X)=XW_1 + \frac{1}{m}(\boldsymbol1\boldsymbol1^T)XW_2\) with parameter matrices \(W_1,W_2\in\mathbf R^{d\times d'}\).

    2. We have a bias vector \(\mathbf b\in\mathbf R^{d'}\).

    Note that, as I write in point 1, to implement multiplication by \(\frac{1}{m}(\boldsymbol1\boldsymbol1^T)\), you can use MeanPool as defined above and broadcasting.

    class PermutationEquivariantLayer(torch.nn.Module):
        """
        Permutation-equivariant affine transformation
        that maps inputs of shape `(sequence_dim, in_features)`
        to outputs of shape `(sequence_dim, out_features)`
        in a way that is equivariant with respect to permutations
        along the sequence dimension.
    
        Arguments
        ---------
        config : `dict`
            Configuration dictionary. Required key-value pairs:
            `"device"` : `str`
                The device to store parameters on.
            `"ensemble_shape"` : `tuple[int]`
                The shape of the ensemble of affine transformations
                the model represents.
        in_features : `int`
            The number of input features
        out_features : `int`
            The number of output features.
        bias : `bool`, optional
            Whether the model should include bias. Default: `True`.
        init_multiplier : `float`, optional
            The weight parameter values are initialized following
            a normal distribution with center 0 and std
            `in_features ** -.5` times this value. Default: `1.`
    
        Calling
        -------
        The expected input is a dictionary of the following
        key-tensor pairs:
        `"features"`
            The feature tensor, of shape
            `(..., sequence_dim, feature_dim)`
        `"mask"`
            This mask signals which entry is not a padding element.
            It should have size `(..., sequence_dim)`.
        """
    
  3. Now, just like the get_mlp factory function we wrote in Notebook 0319, you can write the following factory function to create permutation-invariant MLPs:

    def get_permutation_invariant_mlp(
        config: dict,
        in_features: int,
        out_features: int,
        hidden_layer_num: Optional[int] = None,
        hidden_layer_size: Optional[int] = None,
        hidden_layer_sizes: Optional[Iterable[int]] = None,
    ) -> torch.nn.Sequential:
        """
        Creates a permutation-equivariant MLP
        that is a composite of permutation-equivariant affine layers
        and ReLU activation function
        with a mean pool layer at the end.
    
        Can create a model ensemble.
    
        config : `dict`
            Configuration dictionary. Required key-value pairs:
            `"device"` : `str`
                The device to store parameters on.
            `"ensemble_shape"` : `tuple[int]`
                The shape of the ensemble of affine transformations
                the model represents.
        in_features : `int`
            The number of input features
        out_features : `int`
            The number of output features.
        hidden_layer_num : `int`, optional
            If `hidden_layer_sizes` is not given, we create an MLP with
            `hidden_layer_num` hidden layers of
            `hidden_layer_size` dimensions.
        hidden_layer_size : `int`, optional
            If `hidden_layer_sizes` is not given, we create an MLP with
            `hidden_layer_num` hidden layers of
            `hidden_layer_size` dimensions.
        hidden_layer_sizes: `Iterable[int]`, optional
            If given, each entry gives a hidden layer with the given size.
        """
    
  4. Finally, you can train your model on the convex hull area dataset that we used in Notebook 0416. For training, as the model as we write it here outputs batch dictionaries, you can use the function train_supervised as in Notebook 0423. Since that setup requires the feature tensors located at the "features" key of the batch dictionaries, you need to edit the function generate_convex_hull_dataset, written in Notebook 0416 accordingly.

    Train a model ensemble on the convex hull area dataset, and then test the best ensemble entry on the test split, with plotting the true and predicted convex hull areas, as we did in Notebook 0416.