Pac-Man AI: From Reinforcement Learning to World Models
A PPO agent learns to play Pac-Man, then an RSSM world model learns to dream the game, and a dream agent learns to play entirely inside those dreams.
Problem Statement
Training game-playing AI typically requires millions of real interactions — expensive and slow. This project explores two paradigms: learning by playing (model-free RL) and learning by dreaming (world models). Can an agent learn to play Pac-Man purely from imagined gameplay generated by a learned simulator?
Technical Approach
Phase 1 — PPO Agent (4.2M params): A convolutional neural network learns Pac-Man from scratch through 128 parallel environments. The agent progresses through a three-stage curriculum — scatter-only ghosts, full ghost AI with authentic 1980 targeting strategies (Blinky, Pinky, Inky, Clyde), and Cruise Elroy difficulty. Uses RND curiosity, 4-frame stacking, and annealed entropy for exploration.
Phase 2 — World Model (28M params): A Recurrent State-Space Model (RSSM) watches the trained PPO agent play and learns to simulate Pac-Man entirely in latent space. The model learns observation reconstruction, reward prediction, and episode termination through a combined loss with KL regularization.
Phase 3 — Dream Agent: A new agent trains entirely inside the world model's imagined trajectories — 512 parallel dreams, 15 steps each. It has never seen the real game. This follows the paradigm behind DeepMind's Dreamer and DIAMOND (NeurIPS 2024).
The entire stack is built from scratch: vectorized NumPy game engine, PyTorch training pipeline, and full RSSM architecture. No OpenAI Gym, no pre-built environments.
Results
| Phase | Metric | Result |
|---|---|---|
| PPO Training | Parallel environments | 128 |
| PPO Training | Total updates | 8,000 |
| PPO Training | Level clear rate | 95%+ |
| World Model | Latent dimensions | 2,560 (512 GRU + 2048 categorical) |
| World Model | Parameters | ~28M |
| Dream Agent | Training data | 100% imagined (zero real interactions) |
| Test Suite | Coverage | 79 tests passing |