The AI Wire

5 articles tagged "cs-SE" — page 1 of 1

Airavat: An Agentic Framework for Internet Measurement [TOP LAB](arxiv.org)

2026-02-25|paper|arXiv

Internet measurement faces twin challenges: complex analyses require expert-level orchestration of tools, yet even syntactically correct implementations can have methodological flaws and can be diffic...

cs-NI cs-AI cs-SE

GameDevBench: Evaluating Agentic Capabilities Through Game Development [TOP LAB](arxiv.org)

2026-02-12|paper|arXiv

Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software dev...

cs-AI cs-CL cs-SE

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows create cascading effects across intercon...

cs-AI cs-SE

SERA: Soft-Verified Efficient Repository Agents [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Open-weight coding agents should hold a fundamental advantage over closed-source systems: they can be specialized to private codebases, encoding repository-specific information directly in their weigh...

cs-CL cs-LG cs-SE

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-AI cs-SE