The Machine Learning Toybox for testing of Atari Reinforcement Learning Agents.

View My GitHub Profile

Welcome to! This is the main organization and point of entry for using the Toybox platform for testing and experimentating with autonomous agents.

What is Toybox?

Toybox is a set of highly intervenable environments for testing autonomous agents. While our efforts have focused on the efficient testing of deep RL agents, this work can be used in a variety of contexts that involve white-box testing of black-box agents.

If you use this code, or otherwise are inspired by our white-box testing approach, please cite our NeurIPS workshop paper:

  title={Toybox: Better Atari Environments for Testing Reinforcement Learning Agents},
  author={Foley, John and Tosch, Emma and Clary, Kaleigh and Jensen, David},
  booktitle={NeurIPS 2018 Workshop on Systems for ML},

How do I try it? PyPI version

You can try playing our mocked Atari games locally:

pip install ctoybox
pip install pygame # optional dependency for human_play
python -m ctoybox.human_play amidar

The core repository is toybox-rs. This contains the efficient Rust implementations of our current set of Atari mocks. The ffi exports Rust objects as Python objects; we also support exporting objects as JSON. The ctoybox package contains only low-level reading and writing to and from structured objects in the game. You can see the Google Collab Notebook for instructions on how to interact with the raw exports.

Some games are quite complex. Therefore, we also provide an interface to both read and modify (i.e., intervene on) state via the Toybox repository.

Website still under construction: contact the authors with questions, or post an issue on this repo.