MacFleet: Distributed ML for Apple Silicon

Pool Apple Silicon Macs into a distributed ML training cluster with auto-discovery, adaptive compression, and thermal-aware scheduling.

Why I Built This

Apple Silicon Macs have serious GPU power, but there's no native way to combine them for ML training. If you have three Macs sitting on a desk, that's three GPUs training could use, but they just sit idle. I wanted to make pip install macfleet and macfleet join the only things standing between a few Macs and a training cluster.

How It Works

Zero-config discovery via mDNS/Bonjour. macfleet join is the only command needed. No IP addresses, no config files.
Framework-agnostic core where the communication layer uses only NumPy, never importing PyTorch or MLX. Both engines work through the same infrastructure.
Adaptive gradient compression that auto-selects based on network: no compression over Thunderbolt 4, TopK 10% + FP16 (~20x) over Ethernet, TopK 1% + FP16 (~200x) over WiFi.
Heterogeneous scheduling where faster Macs get proportionally larger batches based on GPU core count. The scheduler re-profiles throughput continuously and adjusts for thermal throttling (nominal → 100%, fair → 90%, serious → 70%, critical → 30%).
Ring AllReduce for N-node gradient synchronization that scales linearly with cluster size.
Dual engine support for both PyTorch (MPS backend) and Apple MLX with identical APIs.

Results

Feature	Detail
Install	`pip install macfleet`
Discovery	Automatic via mDNS (zero config)
Engines	PyTorch (MPS) + Apple MLX
Compression	Up to 200x over WiFi
Thermal management	Real-time workload adjustment
CLI tools	join, status, train, bench, diagnose
API patterns	One-liner, context manager, decorator