The AI Wire

Scaling State-Space Models on Multiple GPUs with Tensor Parallelism [TOP LAB](arxiv.org)

2026-02-25|paper|arXiv

Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. Yet in deployment, their inference performance is often ...

cs-DC cs-LG

DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduce [TOP LAB](arxiv.org)

2026-02-10|paper|arXiv

Multi-hop all-reduce is the de facto backbone of large model training. As the training scale increases, the network often becomes a bottleneck, motivating reducing the volume of transmitted data. Acco...

cs-LG cs-DC cs-NI

AI-Driven Cloud Resource Optimization for Multi-Cluster Environments [TOP LAB](arxiv.org)

2026-01-04|paper|arXiv

Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches remain larg...

cs-DC cs-AI

AI-Driven Cloud Resource Optimization for Multi-Cluster Environments [TOP LAB](arxiv.org)

2026-01-03|paper|arXiv

Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches remain larg...

cs-DC cs-AI

AI-Driven Cloud Resource Optimization for Multi-Cluster Environments [TOP LAB](arxiv.org)

2026-01-02|paper|arXiv

Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches remain larg...

cs-DC cs-AI

AI-Driven Cloud Resource Optimization for Multi-Cluster Environments [TOP LAB](arxiv.org)

2026-01-01|paper|arXiv

Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches remain larg...

cs-DC cs-AI