Pac-Man AI: From Reinforcement Learning to World Models

A PPO agent learns to play Pac-Man, then an RSSM world model learns to dream the game, and a dream agent learns to play entirely inside those dreams.

1 / 4

PPO agent gameplay — toggle between wins and losses

Problem Statement

Training game-playing AI typically requires millions of real interactions — expensive and slow. This project explores two paradigms: learning by playing (model-free RL) and learning by dreaming (world models). Can an agent learn to play Pac-Man purely from imagined gameplay generated by a learned simulator?

Technical Approach

Phase 1 — PPO Agent (4.2M params): A convolutional neural network learns Pac-Man from scratch through 128 parallel environments. The agent progresses through a three-stage curriculum — scatter-only ghosts, full ghost AI with authentic 1980 targeting strategies (Blinky, Pinky, Inky, Clyde), and Cruise Elroy difficulty. Uses RND curiosity, 4-frame stacking, and annealed entropy for exploration.

Phase 2 — World Model (28M params): A Recurrent State-Space Model (RSSM) watches the trained PPO agent play and learns to simulate Pac-Man entirely in latent space. The model learns observation reconstruction, reward prediction, and episode termination through a combined loss with KL regularization.

Phase 3 — Dream Agent: A new agent trains entirely inside the world model's imagined trajectories — 512 parallel dreams, 15 steps each. It has never seen the real game. This follows the paradigm behind DeepMind's Dreamer and DIAMOND (NeurIPS 2024).

The entire stack is built from scratch: vectorized NumPy game engine, PyTorch training pipeline, and full RSSM architecture. No OpenAI Gym, no pre-built environments.

Results

Phase	Metric	Result
PPO Training	Parallel environments	128
PPO Training	Total updates	8,000
PPO Training	Level clear rate	95%+
World Model	Latent dimensions	2,560 (512 GRU + 2048 categorical)
World Model	Parameters	~28M
Dream Agent	Training data	100% imagined (zero real interactions)
Test Suite	Coverage	79 tests passing