Homework 4
In this homework, we'll use grid search to determine the effect of the choice of the number of reduced feature dimensions on the performance of logistic regression models on the IMDB dataset. We'll try out the following choices:
- reduced feature dimensions \(\{2^i:i=3,4,\dotsc,8\}\)
- learning rates \(\{10^i:i=-2,-1.5,\dotsc,1\}\).
Leave other hyperparameters at their values in Notebook 0228.
Note that we can use the same tf-idf matrix in all cases, but we'll have to use various truncated SVD transformers. So we'll decompose the function lsa we wrote in Notebook 0228:
- Fit a tf-idf transformer on the training corpus and get training and validation tf-idf matrices at start of the code.
- Just like in Notebook 0226, make a list to collect best accuracies for different reduced feature dimension values.
-
As you loop over the reduced feature dimension values:
- Make a new truncated SVD transformer. Don't forget to set the random state with
get_seedfor reproducibility. - Fit the truncated SVD transformer on the training tf-idf matrix.
- Use the truncated SVD transformer to transform the training and validation tf-idf matrices and then turn them into
torch.Tensors. - Train a logistic regression model on these feature matrices like in Notebook 0228.
- Make a new truncated SVD transformer. Don't forget to set the random state with
-
Make a heatmap to summarize your results.