Session: Scaling Large Models with Model & Data Parallelism: Techniques, Tradeoffs, and Best Practices
Discover how to train and serve massive AI models efficiently by leveraging both model and data parallelism. In this session, we’ll explore how to partition large models across GPUs and distribute data for optimal throughput, diving deep into practical setup details and performance benchmarks. We’ll also address the key tradeoffs—such as latency vs. resource usage—and show how to tailor parallelization strategies to different AI tasks, going beyond transformers into computer vision and more. By the end, you’ll have a holistic understanding of how to design and deploy parallelized workflows that balance accuracy, speed, and infrastructure costs, enabling you to scale AI solutions effectively in real-world scenarios.