Vision-language models serve as teachers to distill video reasoning capabilities into smaller student models via adaptive optimization at test time.
A cross-family context compression method reduces long-context input length for reasoning models by identifying and retaining attended tokens across different model architectures.
Empirical analysis identifies conditions under which multi-agent reinforcement learning improves LLM-based workflows, examining workflow structure, scale, and policy-sharing strategies.
A video world model is steered to synthesize stress-test scenarios for evaluating and improving robustness of learned policies under challenging conditions.
Vision-language models decompose inverse graphics into staged executable steps within Blender to recover 3D scene structure and attributes from images.
A temporal scheduling strategy for RLVR determines not only which training samples to use but when during training to apply them for optimal reasoning improvement.
Introduces a benchmark evaluating AI agents that generate procedural 3D models through code, measuring their ability to produce correct geometric outputs programmatically.
Presents a framework where multiple AI agents collaborate to operate computer interfaces, distributing GUI interaction tasks across specialized agents.
Proposes a method that jointly trains agent memory and exploration behavior using novelty-based signals to improve navigation and discovery in unknown environments.
Uses unmodified LLMs to score intermediate reasoning steps in math problems at inference time, replacing trained process reward models without any additional training.
Releases an open framework for training visual web agents with online multi-turn RL, clarifying implementation details that enable agents to learn from live browser interactions.
Benchmarks LLM agents on personal productivity tasks by simulating realistic personal data environments, testing performance on real-world applications like calendars and email.
Reports that attackers used social engineering prompts to manipulate Meta AI into granting unauthorized access to high-profile Instagram accounts.
Describes a tool or feature enabling users to directly edit files that have been pasted into an interface, streamlining in-context file modification.
Argues that enterprise AI scaling bottlenecks stem from agent orchestration logic rather than LLM capability, advocating for purpose-built agent architectures over raw model scaling.
JetBrains releases Mellum2, a 12-billion-parameter mixture-of-experts language model, likely targeting developer-focused coding and IDE assistance tasks.
Announces infrastructure investment in Michigan to build data centers or computing facilities supporting AI workloads as part of a broader national AI build-out.
Articulates an organization's official positions on AI governance policy and the boundaries of appropriate political engagement or lobbying activity.
Anthropic has filed a confidential draft S-1 registration statement with the SEC, initiating the regulatory process toward a potential public offering.
xAI has released Composer 2.5, a code/content composition tool, now integrated into the Grok Build development environment.
A survey or advocacy piece covers the resurgence of terminal user interface tools, highlighting strace-ui and Bonsai_term as examples of the TUI revival.
A system simulates agent behavior or cognition using the Free Energy Principle as the computational and theoretical foundation.
A 1-bit quantized 4B-parameter image generation model optimized to run locally on consumer devices with minimal memory and compute.
A United Airlines 767 diverted back to Newark after a passenger's Bluetooth device name triggered a security alert onboard.
A vulnerability in ChatGPT's Google Sheets integration allows malicious prompts to exfiltrate spreadsheet data to external parties.
An analysis of how AI tools have dramatically accelerated the software prototyping cycle, reducing time from concept to working demo.
An argument that remote work reduced mentorship and visibility for junior employees, explaining weak junior hiring better than AI displacement does.
LangChain provides a framework for building LLM-powered agents and chains, abstracting prompt management, tool use, and memory.
Open WebUI delivers a self-hosted browser interface for interacting with local and API-based LLMs including Ollama and OpenAI-compatible endpoints.
Dify provides a production-ready platform for designing, deploying, and managing agentic LLM workflows with built-in orchestration tooling.