VR

Pac-Man AI: From Reinforcement Learning to World Models

A PPO agent learns to play Pac-Man, then an RSSM world model learns to dream the game, and a dream agent learns to play entirely inside those dreams.

PythonPyTorchPPORSSMNumPyMPS
1 / 4
PPO agent gameplay — toggle between wins and losses

Problem Statement

Training game-playing AI typically requires millions of real interactions — expensive and slow. This project explores two paradigms: learning by playing (model-free RL) and learning by dreaming (world models). Can an agent learn to play Pac-Man purely from imagined gameplay generated by a learned simulator?

Technical Approach

Phase 1 — PPO Agent (4.2M params): A convolutional neural network learns Pac-Man from scratch through 128 parallel environments. The agent progresses through a three-stage curriculum — scatter-only ghosts, full ghost AI with authentic 1980 targeting strategies (Blinky, Pinky, Inky, Clyde), and Cruise Elroy difficulty. Uses RND curiosity, 4-frame stacking, and annealed entropy for exploration.

Phase 2 — World Model (28M params): A Recurrent State-Space Model (RSSM) watches the trained PPO agent play and learns to simulate Pac-Man entirely in latent space. The model learns observation reconstruction, reward prediction, and episode termination through a combined loss with KL regularization.

Phase 3 — Dream Agent: A new agent trains entirely inside the world model's imagined trajectories — 512 parallel dreams, 15 steps each. It has never seen the real game. This follows the paradigm behind DeepMind's Dreamer and DIAMOND (NeurIPS 2024).

The entire stack is built from scratch: vectorized NumPy game engine, PyTorch training pipeline, and full RSSM architecture. No OpenAI Gym, no pre-built environments.

Results

PhaseMetricResult
PPO TrainingParallel environments128
PPO TrainingTotal updates8,000
PPO TrainingLevel clear rate95%+
World ModelLatent dimensions2,560 (512 GRU + 2048 categorical)
World ModelParameters~28M
Dream AgentTraining data100% imagined (zero real interactions)
Test SuiteCoverage79 tests passing