Teaching an AI to Play Super Mario Land

2026-04-28

Training a Double Deep Q-Network agent to play Super Mario Land — covering environment wrappers, replay memory, target networks, and reward shaping.

This is a notebook for training a double deep Q learning model to play mario on a gameboy emulator

This project is based on https://pytorch.org/tutorials/intermediate/mario_rl_tutorial.html tutorial on how to make ai learn to play mario. Both this project and the tutorial use a gym environments with ready made game wrapper for getting information from RAM. This project uses different emulator, but includes similar code snippets from the tutorial that with more or less changes in order to fit this environment and my goals. For example the functions for learning the deep learning model and the metric logger are almost step by step from the tutorial and the environment is processed in a similar way.

The emulator used for this project is a original gameboy emulator PyBoy.

https://docs.pyboy.dk/

PyBoy can be used with an openai gym environment

https://docs.pyboy.dk/openai_gym.html

Some games in pyboy like super mario land are supported with a ready made game wrapper which gets information about the game from its RAM.

https://docs.pyboy.dk/plugins/game_wrapper_super_mario_land.html

Inspiration taken from: https://github.com/lixado/PyBoy-RL

Requirements for the notebook

Requirement already satisfied: tensordict==0.2.0 in /usr/local/lib/python3.10/dist-packages (0.2.0)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from tensordict==0.2.0) (2.1.0+cu118)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from tensordict==0.2.0) (1.23.5)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from tensordict==0.2.0) (2.2.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (3.1.2)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->tensordict==0.2.0) (2.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->tensordict==0.2.0) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->tensordict==0.2.0) (1.3.0)
Requirement already satisfied: torchrl==0.2.0 in /usr/local/lib/python3.10/dist-packages (0.2.0)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from torchrl==0.2.0) (2.1.0+cu118)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from torchrl==0.2.0) (1.23.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from torchrl==0.2.0) (23.2)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from torchrl==0.2.0) (2.2.1)
Requirement already satisfied: tensordict>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from torchrl==0.2.0) (0.2.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (3.1.2)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->torchrl==0.2.0) (2.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->torchrl==0.2.0) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->torchrl==0.2.0) (1.3.0)
Requirement already satisfied: gym in /usr/local/lib/python3.10/dist-packages (0.25.2)
Requirement already satisfied: numpy>=1.18.0 in /usr/local/lib/python3.10/dist-packages (from gym) (1.23.5)
Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from gym) (2.2.1)
Requirement already satisfied: gym-notices>=0.0.4 in /usr/local/lib/python3.10/dist-packages (from gym) (0.0.8)
Requirement already satisfied: pyboy in /usr/local/lib/python3.10/dist-packages (1.6.10)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from pyboy) (1.23.5)
Requirement already satisfied: pysdl2 in /usr/local/lib/python3.10/dist-packages (from pyboy) (0.9.16)
Requirement already satisfied: pysdl2-dll in /usr/local/lib/python3.10/dist-packages (from pyboy) (2.28.4)

/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

Custom PyBoy gym environment

Custom environment is made in order to change the action space and how the button presses are managed. PyBoy works in a way that when an action is made, a button in the emulator is pressed. The buttons also have to be released in order to make sensical movement in the game. For this, a release function has been implemented so when mario chooses DO_NOTHING action, all of the buttons are released and mario stops. The step function also includes the reward which is provided by the game wrapper.

Game state function

Wrappers for the environment.

SkipFrame

Not all frames need to be used for button presses so we can use a SkipFrame class to skip n frames. The reward for each frame is summed together and given to the initial button pressed.

ResizeObservation

The observation space can be resized for better performance in learning. PyBoy comes with a good simplified view of the screen so a rescale is not really needed. However this project uses frame stacking, so the observation space has to be transofrmed into a "box"

The Mario agent class

Mario class holds the act policies, information about the neural network and the functions to estimate and train the model. A cache function is used to collect limited number of steps to the agents memory, which are then used to train the deep Q learning model in batches of data. The class includes TD estimate- and target functions and the models loss backpropagation is done with update function.

I implemented functions for saving and loading the model parameters and memory data in google drive in order to train the models using checkpoints. You have to have a checkpoints folder in drive in order for this to work.

A simple neural network for this project

MetricLogger

Taken directly from the tutorial.

https://pytorch.org/tutorials/intermediate/mario_rl_tutorial.html

Initializing the environment

/usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: [33mWARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.[0m
  deprecation(

Mounted at /content/drive

Load checkpoint files from drive

The program saves and loads files from drive (if wanted) , so the drive paths should be changed accordingly.

The checkpoint and memory files can be quite large, around 5gb so beware when loading to drive

Also paths should be changed

0
LOADING OLD MEMORY
3903987
0.14199066147018538

Train the model!

exploration rate 0.99999975 => = 0.99999950

Memory 200000 => 300000

batch size 128 => 64 => 32 => 64

Reset on every death (game doesnt need info on respawn states + takes less time)

Normalized observation

Added checkpoints

TO DO:

Current episode  0
Episode 0 - Step 3948522 - Epsilon 0.13886382623643137 - Cumulative reward last 100 episodes 664.25 - Maximum distance reached 1591.71 - Mean Length 286.37 - Mean Loss 0.953 - Mean Q Value 22.487 - Time Delta 171.164 - Time 2023-12-11T19:25:10
Mounted at /content/drive
MarioNet saved to checkpoints/2023-12-11T08-58-33/mario_net2_3.chkpt at step 3948522

output