absorb.md

Yulun Wang

Chronological feed of everything captured from Yulun Wang.

NCCLX: Scaling Collective Communication for Large Language Models

The NCCLX framework addresses the communication bottlenecks for LLM training and inference on GPU clusters exceeding 100,000 GPUs. It optimizes for both high-throughput synchronous training and low-latency inference demands. This solution facilitates operation of next-generation LLMs at unprecedented scales.