{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Negative Log Likelihood\n",
    "\n",
    "## Setup\n",
    "\n",
    "### Imports\n",
    "\n",
    "1. In case you haven't checked out the appendix on data preprocessing\n",
    "in notebook 0207:\n",
    "The `datasets` library: https://huggingface.co/docs/datasets/index\n",
    "lets us access the dataset repository of Hugging Face:\n",
    "https://huggingface.co/datasets\n",
    "\n",
    "    Hugging Face is one of the biggest open source collections\n",
    "of deep learning models, datasets and learning algorithms.\n",
    "Their primary focus is on Natural Language Processing (NLP)\n",
    "but they have plenty of resources pertaining to other directions\n",
    "of Machine Learning such as image generation or reinforcement learning.\n",
    "\n",
    "    Import the `datasets` library.\n",
    "\n",
    "2. The module `torch.nn.functional` contains many important functions\n",
    "such as `cross_entropy`. By convention, the module is imported as `F`. Import the module following this convention.\n",
    "\n",
    "3. Import `matplotlib.pyplot` as `plt`.\n",
    "\n",
    "4. Import the `torch` module.\n",
    "\n",
    "3. With the `tqdm` module, we can use a progress bar when performing train steps. Import the module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constants\n",
    "\n",
    "Create a configuration dictionary with the following key-value pairs:\n",
    "- `\"dataset_path\"`: `str`   \n",
    "    The path to the dataset on Hugging Face. Make it `\"ylecun/mnist\"`.\n",
    "- `\"device\"`: `str`   \n",
    "    This says if you'll use a GPU. Common choices:\n",
    "    * \"cuda\": If you have a CUDA-compatible GPU on Linux or Windows.\n",
    "    * \"mps\": If you are on MacOSX and you have a GPU.\n",
    "    * \"cpu\": If you don't have a `torch`-compatible GPU.\n",
    "\n",
    "    It can also be an integer, if you are using a multi-GPU computer.\n",
    "- `\"learning_rate\"`: `float`   \n",
    "    Make this a 1.\n",
    "- `\"seed\"`: `int`   \n",
    "    Insert any integer.\n",
    "- `\"steps_num\"`: `int`   \n",
    "    Make this a 100."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually, this would be the time where we set `torch` pseudorandom-number generation as per `config[\"seed\"]`. But today, we'll not use any randomness via `torch`, so we skip this step. The only random step will be to split the validation set off the training set, which we'll do through the `datasets` library."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset Preprocessing\n",
    "\n",
    "This time, we'll do go through some preprocessing.\n",
    "We'll access the dataset through Hugging Face\n",
    "Check out the dataset page here:\n",
    "https://huggingface.co/datasets/ylecun/mnist  \n",
    "You can load the dataset using the function `datasets.load_dataset`.\n",
    "Make the first positional argument `config[\"dataset_path\"]`.\n",
    "You can at once call the `with_format` method on the output with:\n",
    "1. `\"torch\"` as first positional argment and\n",
    "2. setting the `device` keyword argument to `config[\"device\"]`\n",
    "This method outputs the dataset in such a way that data\n",
    "is converted to `torch.Tensor`s on the device of your choice.\n",
    "\n",
    "Print the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that we have two dataset splits: train and test,\n",
    "with 60 000 and 10 000 entries, respectively.\n",
    "You can access these datasets at their keys, just as if this\n",
    "output was an ordinary dictionary.\n",
    "Assign the two datasets to two variables, then print them out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you can make the train-valid split using the\n",
    "`train_test_split` method of the train dataset.\n",
    "Set the following two keyword arguments:\n",
    "1. `\"seed\"`: This should be `config[\"seed\"]`.\n",
    "2. `\"test_size\"`: We want to make the validation set have the\n",
    "    same size as the test set.\n",
    "Print the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assign the training and validation datasets to two variables.\n",
    "Print them out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you index into a `datasets.Datasets` with a feature name,\n",
    "it returns the data stored in that feature.\n",
    "As we declared in `with_format`, the data is returned as `torch.Tensor`s.\n",
    "Assign the two tensors to two variables\n",
    "Print their shapes and their `device` and `dtype` attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looks like some transformations are needed for the image tensor\n",
    "\n",
    "Note that the dtype is `uint8` for unsigned 8-bit integer.\n",
    "That is, the 8-bit colors from 0 to 255 (inclusive) are stored directly.\n",
    "We need to transform this to a floating point datatype.\n",
    "\n",
    "Also, we should flatten the image.\n",
    "\n",
    "Finally, by convention we should divide the values by 255\n",
    "so that they are between 0 and 1.\n",
    "We shall study more refined normalization techniques later.\n",
    "\n",
    "Let's write a function for this, so that we can at once\n",
    "use it for the validation and test images too.\n",
    "\n",
    "To change the `dtype` or `device` of a tensor,\n",
    "you can use its method `to`. You need to give the new device or dtype\n",
    "in the positional argument.\n",
    "\n",
    "To reshape a tensor, you can use the method `reshape`.\n",
    "As positional arguments, you need to give the components of the new shape\n",
    "or all the components in a sequence such as a tuple or a list.\n",
    "You can make at most one component -1\n",
    "so that its value is inferred from the others.\n",
    "I prefer to forego this if possible for debugging purposes.\n",
    "\n",
    "Once you wrote the function, apply it to the train images.\n",
    "Check the shape and dtype of its output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def flatten_images(\n",
    "    images: torch.Tensor,\n",
    "    dtype=torch.float32,\n",
    "    scale=1/255\n",
    ") -> torch.Tensor:\n",
    "    \"\"\"\n",
    "    Given as input a batch of images of shape\n",
    "    `(batch_size, channel_num, height, width)`\n",
    "    flatten it and output a tensor of shape `(batch_size, feature_dim)`.\n",
    "\n",
    "    Moreover:\n",
    "    1. transform the tensor to `dtype` and\n",
    "    2. multiply it by `scale`.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    images : torch.Tensor\n",
    "        The images in `torch.Tensor` format.\n",
    "    dtype : torch.dtype, optional\n",
    "        The dtype to transform the tensor to. Default: `torch.float32`.\n",
    "    scale : float, optional\n",
    "        The value to scale the tensor with. Default: `1 / 255`.\n",
    "    \"\"\"\n",
    "    raise NotImplementedError\n",
    "\n",
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make a higher level preprocessing function\n",
    "that gets flattened features, and labels\n",
    "from a dataset. Optionally, you can add a column of 1s to the feature matrix.\n",
    "\n",
    "Once you wrote it, call it on all three splits and assign\n",
    "all output to variables. You should get 6 variables:\n",
    "train, valid and test features and labels.\n",
    "\n",
    "Print the shape and the dtype of all 6."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_dataset(\n",
    "    dataset: datasets.Dataset,\n",
    "    column_1s=False,\n",
    "    device=\"cpu\",\n",
    "    features_dtype=torch.float32,\n",
    "    features_scale=1 / 255,\n",
    "    images_name=\"image\",\n",
    "    labels_name=\"label\"\n",
    ") -> tuple[torch.Tensor, torch.Tensor]:\n",
    "    \"\"\"\n",
    "    Given an image classification dataset, outputs as `torch.Tensor`s:\n",
    "    - the image data:\n",
    "        * flattened to a matrix of shape `(dataset_size, feature_num)`,\n",
    "        * with an added column of 1s if `bias`,\n",
    "        * transformed to `features_dtype` and\n",
    "        * multiplied by `features_scale` and\n",
    "    - the label data:\n",
    "        * as a vector of shape `(dataset_size)` and\n",
    "        * of dtype `torch.int64`.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    dataset : datasets.Dataset\n",
    "        The dataset to preprocess. We assume that the image and label\n",
    "        feature names are `images_name` and `labels_name`, respectively.\n",
    "    column_1s : bool, optional\n",
    "        Whether to add a column of 1s to the feature matrix\n",
    "        which can be used as feature for the bias. Default: `False`.\n",
    "    device : torch.device | int | str, optional\n",
    "        The device to put data tensors to. Default: \"cpu\"\n",
    "    features_dtype : torch.dtype, optional\n",
    "        The floating point datatype to transform the feature matrix to.\n",
    "        Default: `torch.float32`.\n",
    "    feature_scale : float, optional\n",
    "        The value to multiply the feature matrix with.\n",
    "        Default: `1 / 255`.\n",
    "    images_name : str, optional\n",
    "        The name of the dataset entry storing the images. Default: `\"image\"`\n",
    "    labels_name : str, optional\n",
    "        The name of the dataset entry storing the labels. Default: `\"label\"`\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    The pair `feature_matrix, labels`.\n",
    "    \"\"\"\n",
    "    raise NotImplementedError\n",
    "\n",
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's save a dictionary that contains the 6 tensors with `torch.save`.\n",
    "The first positional argument is the object that you want to save\n",
    "and the second is the save file path.\n",
    "This will speed up starting up as we won't have to repeat preprocessing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training and Evaluation\n",
    "\n",
    "Let's write the training loop! It will be quite similar\n",
    "to the one your wrote in notebook 0212, using `torch.optim`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialization\n",
    "\n",
    "1. Initialize a zero matrix of appropriate shape, and a bias vector, if you did not add a column of 1s to the feature matrix.\n",
    "    Make sure to use the correct `dtype` and require gradient tracking.\n",
    "    Moreover, in the keyword argument `device`, supply `config[\"device\"]`.\n",
    "2. Initialize a `torch.optim.SGD` to optimize the weight matrix, and the bias vector, if you're using one.\n",
    "    As learning rate, use `config[\"learning_rate\"]`.\n",
    "3. Create an empty list to store stepwise losses in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "\n",
    "1. To have a progress bar when we are looping over `config[\"steps_num\"]`\n",
    "    items, create a `tqdm.trange` object. You can use the same\n",
    "    positional argument(s) you would in a `range` and you can loop over\n",
    "    it the same way.\n",
    "2. In each step:\n",
    "    1. Zero the gradients with the optimizer.\n",
    "    2. Calculate the *logits*, that is the product of the\n",
    "        train feature matrix and the weight matrix, plus maybe the bias vector.\n",
    "    3. You can output the mean negative log likelihood\n",
    "        using `F.cross_entropy`. This is written in a numerically stable\n",
    "        way and does not require us to apply a softmax on the logits.\n",
    "        Use as positional arguments the logits and the train labels.\n",
    "    4. Backpropagate gradients from the loss value you got.\n",
    "    5. Take a step with the optimizer.\n",
    "    6. Append the value stored in the loss tensor to the list of losses.\n",
    "        As we saw in notebook 0212, you can use the method `item`.\n",
    "        This item requires the tensor to be stored in RAM,\n",
    "        that is have `device == \"cpu\"`. Otherwise, you can get a copy of a tensor\n",
    "        stored in RAM with the `cpu` method.\n",
    "3. Call the `close` method of the progress bar to avoid\n",
    "    output artifacts.\n",
    "4. Make a line plot of the losses. Show and clear the canvas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Validation\n",
    "\n",
    "Let's see how good our weights are really:\n",
    "that is, let's calculate the validation accuracy.\n",
    "\n",
    "1. Get the validation logits. As we are not training on this, either:\n",
    "    a. do this in a `torch.no_grad` context or\n",
    "    b. `detach` the weights before the operation.\n",
    "2. For each validation entry, get the index of the class with\n",
    "    maximum logit. This can be achieved by the `argmax` method\n",
    "    on the logit tensor, where its keyword argument `dim` determines\n",
    "    along which dimension we are taking the argmax.\n",
    "3. If all went well, you have two label tensors of the same size at hand:\n",
    "    1. the true label tensor that is part of the validation dataset and\n",
    "    2. the predicted label tensor that you got with the `argmax`.\n",
    "\n",
    "    The `==` operation between tensors of the same shape\n",
    "    creates a Boolean tensor of the same shape each entry of which says\n",
    "    whether the entries in same position in the two tensors are equal.\n",
    "\n",
    "    Form this tensor.\n",
    "4. If you transform this tensor to a numerical datatype,\n",
    "    you will get 1 at `True`, and 0 at `False`.\n",
    "    Transform the equality tensor to a floating point one.\n",
    "5. Now you can take the mean of the floating point equality tensor\n",
    "    to get the equality.\n",
    "\n",
    "Print out this value!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Datasets\n",
    "\n",
    "### MNIST\n",
    "\n",
    "https://huggingface.co/datasets/ylecun/mnist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## License\n",
    "\n",
    "This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dml",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
