VR

MacFleet: Distributed ML for Apple Silicon

Pool Apple Silicon Macs into a distributed ML training cluster with auto-discovery, adaptive compression, and thermal-aware scheduling.

PythonPyTorchMLXgRPCmDNSRing AllReduce

Why I Built This

Apple Silicon Macs have serious GPU power, but there's no native way to combine them for ML training. If you have three Macs sitting on a desk, that's three GPUs training could use, but they just sit idle. I wanted to make pip install macfleet and macfleet join the only things standing between a few Macs and a training cluster.

How It Works

  • Zero-config discovery via mDNS/Bonjour. macfleet join is the only command needed. No IP addresses, no config files.
  • Framework-agnostic core where the communication layer uses only NumPy, never importing PyTorch or MLX. Both engines work through the same infrastructure.
  • Adaptive gradient compression that auto-selects based on network: no compression over Thunderbolt 4, TopK 10% + FP16 (~20x) over Ethernet, TopK 1% + FP16 (~200x) over WiFi.
  • Heterogeneous scheduling where faster Macs get proportionally larger batches based on GPU core count. The scheduler re-profiles throughput continuously and adjusts for thermal throttling (nominal → 100%, fair → 90%, serious → 70%, critical → 30%).
  • Ring AllReduce for N-node gradient synchronization that scales linearly with cluster size.
  • Dual engine support for both PyTorch (MPS backend) and Apple MLX with identical APIs.

Results

FeatureDetail
Installpip install macfleet
DiscoveryAutomatic via mDNS (zero config)
EnginesPyTorch (MPS) + Apple MLX
CompressionUp to 200x over WiFi
Thermal managementReal-time workload adjustment
CLI toolsjoin, status, train, bench, diagnose
API patternsOne-liner, context manager, decorator