{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Solving Frozen Lake by Hand\n",
    "\n",
    "## Setup\n",
    "\n",
    "### Install\n",
    "\n",
    "We'll need 3 libraries:\n",
    "1. When OpenAI was actually more open, they developed `gym`, a library giving a unified framework for Reinforcement Learning experiments. Maintenance gradually decreased, but then it was taken over by the non-profit the Farama Foundation. The new package is called `gymnasium`:  \n",
    "https://gymnasium.farama.org/  \n",
    "We install it with the option `toy_text` as that gets us Frozen Lake, the environment we'll introduce RL with:  \n",
    "https://gymnasium.farama.org/environments/toy_text/frozen_lake/\n",
    "2. To get gameplay videos, we use the package `moviepy`:  \n",
    "https://zulko.github.io/moviepy/\n",
    "3. To get gameplay images, we use the package `pillow`:\n",
    "https://python-pillow.org/\n",
    "\n",
    "To install these packages, run this line in a terminal, with your `mamba` environment active:\n",
    "```bash\n",
    "pip install \"gymnasium[toy_text]\" moviepy pillow\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports\n",
    "\n",
    "1. Import `Callable` and `torch`.\n",
    "1. By convention, import `gymnasium` as `gym`.\n",
    "4. So that we can see gameplay videos, import `Image` from `IPython.display`.\n",
    "4. In order to make gameplay videos, import `ImageSequenceClip` from `moviepy`.\n",
    "3. Import the module `os` so that we can create the video dictionary and get the gif paths from Python.\n",
    "4. So that we can see gameplay images, import `Image` from `PIL` as `PILImage`.\n",
    "5. Import `Optional` from `typing`. This is type hint for an argument that can be `None` or some other type.\n",
    "6. Import the function `get_seed` that you created in Notebook 0228."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constants\n",
    "\n",
    "Create a configuration dictionary with the following keys:\n",
    "- `\"discount\"`: `float`  \n",
    "    This is the constant we use to get discounted returns. Make it the default `0.99`.\n",
    "- `\"env_id\"`: `str`  \n",
    "    In the unified API of `gym`, every MDP, or *environment* has a unique identifier. We'll play Frozen Lake. You can see on its page:  \n",
    "    https://gymnasium.farama.org/environments/toy_text/frozen_lake/  \n",
    "    that its identifier is `\"FrozenLake-v1\"`.\n",
    "- `\"env_kwargs\"`: `dict`  \n",
    "    This dictionary stores extra settings in the environment. Set it to the following:\n",
    "    - `\"is_slippery\"`: `bool`  \n",
    "        This makes the transitions stochastic:  \n",
    "        https://gymnasium.farama.org/environments/toy_text/frozen_lake/#is_slippy  \n",
    "        Set this to `False` for now.\n",
    "    - `\"map_name\"`: `str`  \n",
    "        If given, a preloaded map will be used:  \n",
    "        https://gymnasium.farama.org/environments/toy_text/frozen_lake/#arguments  \n",
    "        Set this to `\"4x4\"` for now.\n",
    "- `\"gif_fps\"`: `int`  \n",
    "    The frames per second (FPS) to use when making gameplay gifs. I set this to `20`. Change it at will.\n",
    "- `\"seed\"`: `int`  \n",
    "    This is for reproducible experiments. Insert any integer.\n",
    "- `\"videos_dictionary\"`: `str`\n",
    "    The path to the directory to store videos at. I set this to `videos`. Change it at will."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set the `torch` pseudo-random number generation seed to the value given in the configuration dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Meet a `gym.Env`\n",
    "\n",
    "### `make` an `Env`\n",
    "\n",
    "`gym` environments are created by the `gym.make` function.\n",
    "1. Its positional argument should be the environment ID.\n",
    "2. In its keyword argument `render_mode`, you can specify whether to save gameplay pictures and if so, how. See here:  \n",
    "https://gymnasium.farama.org/api/env/#gymnasium.Env.render  \n",
    "Set this to `rgb_array`.\n",
    "3. Give it the extra keyword arguments stored at `env_kwargs` in the configuration dictionary using the destructuring operator `**`.\n",
    "\n",
    "We get a `gym.Env` object. Print its `observation_space` and `action_space` attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the observation and action spaces are `gym.spaces.Discrete` objects. They describe discrete sets with 16 and 4 elements, respectively. You can see the environment documentation for description:  \n",
    "https://gymnasium.farama.org/environments/toy_text/frozen_lake/#action-space\n",
    "\n",
    "You can get the size of a `Discrete` object at its `n` attribute. Do this for both the observation and the action space.\n",
    "\n",
    "You can sample an action uniformly from the action space using its `sample` method. Before this, you can set the seed of this sampler with the `seed` method. Do the latter first using `get_seed`, then sample 10 actions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `reset`\n",
    "\n",
    "You can reset an `Env` and thus get an initial observation by the `reset` method. For probabilistic environments, you can set the seed by the `seed` keyword argument. This time, as now transitions are deterministic, there is no need for setting the seed, but let's just as well do it, so that we don't forget about it, when we actually need it.\n",
    "\n",
    "Print the output of the `reset` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You received a pair of an initial observation and a metadata dictionary. In the latter, you can see the effect of the `is_slippery=False` setting: The character will move in the direction specified by the action with probability 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we set `render_mode=\"rgb_array\"`, the `render` method of the environment returns the picture of the initial observation as an RGB array. Feed this to `PILImage.fromarray` so that you can see the image in the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `step`\n",
    "\n",
    "We can take an action and receive the transition information by the `step` method of the environment. It returns a quintuple of:\n",
    "1. the next observation,\n",
    "2. the reward,\n",
    "3. whether the environment was truncated by a maximum number of steps setting (does not apply right now)\n",
    "4. whether the observation we received is that of a terminal state and\n",
    "5. a dictionary of additional information.\n",
    "\n",
    "Conferring the environment page, take a step to the right. Print the output of the `step` method. Then show the picture of the next observation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's walking!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Episodes\n",
    "\n",
    "### Using a Random Policy\n",
    "\n",
    "Time to generate an episode with a random policy!\n",
    "\n",
    "1. At initialization:\n",
    "    1. Create a return variable. Note that `return` is a Python reserved word, so you need to give the variable some other name.\n",
    "    2. Create a list to store observation pictures in.\n",
    "    3. Create a time step counter. Initialize it to 0.\n",
    "    3. Reset the environment.\n",
    "    4. Append to the picture list that of the initial observation.\n",
    "    5. To make sure that the video dictionary exists, call `os.makedirs`. Use the `exist_ok` keyword argument so that it does not throw an error if the dictionary exists already.\n",
    "2. In an infinite loop:\n",
    "    1. Sample an action.\n",
    "    2. Take a step with the action. Get the reward and the termination information.\n",
    "    3. Update the return using:\n",
    "        1. the reward,\n",
    "        2. the discount, as given in the configuration dictionary and\n",
    "        3. the time step counter.\n",
    "    4. Append to the picture list that of the last observation.\n",
    "    5. If this is a terminal state, break the loop.\n",
    "    6. Increment the step counter.\n",
    "3. Print the return.\n",
    "4. To get a gameplay gif and show it, use these instructions:  \n",
    "    https://stackoverflow.com/questions/60914488/how-can-i-make-gif-using-arrays/64796174#64796174  \n",
    "    You can get a gif path by concatenating the video directory path and a file name using `os.path.join`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Solving the Environment by Hand\n",
    "\n",
    "Time to write a deterministic policy that solves this environment! You can make it a list of 16 action indices. Then as observations are state indices, indexing into it by an observation index, you should get an action.\n",
    "\n",
    "Also, refactor your above episode runner into a function that has a policy argument. The policy should be a function that takes an observation and returns an action. If the policy is set at its default value `None`, inside the function, create a random policy function. Note that we also make movie generation optional.\n",
    "\n",
    "Finally before you test your solution, calculate the discounted return by hand. You can see on the environment page that you get reward 1 if you reach the gift box and 0 otherwise. Then see if you indeed got that discounted return in the episode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_episode(\n",
    "    config: dict,\n",
    "    env: gym.Env,\n",
    "    gif_name: Optional[str] = None,\n",
    "    policy: Optional[Callable[[int], int]]=None,\n",
    ") -> float:\n",
    "    \"\"\"\n",
    "    Run an episode in a `gym.Env`\n",
    "    with discrete observation and action spaces,\n",
    "    following a policy.\n",
    "\n",
    "    Make a gif video of the gameplay.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    config : dict\n",
    "        Configuration dictionary. Required key-value pairs:\n",
    "        gif_fps : int\n",
    "            Frames per second in the gif.\n",
    "        video_directory : str\n",
    "            If `gif_name` is given, the created movie will be saved\n",
    "            to this directory.\n",
    "    env : gym.Env\n",
    "        The environment to get an episode in.\n",
    "    gif_name : str, optional\n",
    "        If given, a gif movie is saved to this filename\n",
    "        in `config['videos_directory]`.\n",
    "    policy : Callable[[int], int], optional\n",
    "        The policy to get an episode with. Default: random policy.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    The discounted return of the episode.\n",
    "    \"\"\"\n",
    "    raise NotImplementedError\n",
    "\n",
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dataset References\n",
    "\n",
    "Frozen Lake https://gymnasium.farama.org/environments/toy_text/frozen_lake/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# License\n",
    "\n",
    "This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dml",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}