Homework 8
In this homework, we'll compare L2 regularization with weight decay, when used with Adam. Moreover, we'll try out our MLP + PBT + Adam approach on the emotions-sadness-joy text classification dataset.
- Add optional L2 regularization to the
Optimizerbase class that you wrote in Notebook 0326. Probably the easiest way to go is to update the gradient of each parameter in-place at the beginning ofstepor_update_parameter. - Recreate the
AdamWoptimizer class. You can just use the code you wrote in Notebook 0326. Why you need to recreate it is that you changed its parent class. - Load the dataset and use features coming from word vectors just like in Notebook 0305. I recommend saving the feature matrices you get from averaging word vectors so that in multiple runs you don't have to rerun preprocessing.
-
Perform two training runs of
pbt, with population size 64 and validation interval 1000:- One should have L2 regularization turned on and weight decay turned off.
- The other should have L2 regularization turned off and weight decay turned on.
After each run:
- Print the best validation binary accuracy you got.
- Make line plots with confidence bands of:
- 1 - log10 of the first and second moment moving average weight decays and
- log10 of the learning rate, epsilon and the L2 regularization coefficient or the weight decay rate, depending on the run.