Top story
China may have accessed Anthropic's frontier model "Mythos". Reports indicate China may have gained access to an Anthropic frontier model, raising significant security and export-control concerns. Source
Research
When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks - Shows that VLM self-improvement via verifier feedback can regress on out-of-distribution tasks. Source
MBench: A Comprehensive Benchmark on Memory Capability for Video World Models - Introduces a comprehensive benchmark evaluating memory capabilities in video world models. Source
Compressed Computation is (probably) not Computation in Superposition - Theoretical critique of the "computation in superposition" hypothesis in neural networks. Source
Rethinking RAG in Long Videos: What to Retrieve and How to Use It? - Examines retrieval strategies and usage methods for RAG systems over long-form video. Source
Tools
RhymeFlow: Training-Free Acceleration for Video Generation - Enables training-free speedups for video generation via asynchronous denoising flow scheduling. Source
Avatar V: Scaling Video-Reference Avatar Video Generation - Scales video-reference avatar generation with improved quality at higher resolutions. Source
APPO: Agentic Procedural Policy Optimization - Introduces procedural policy optimization tailored for training agentic LLM policies. Source
Orchestra-o1: Omnimodal Agent Orchestration - Framework for orchestrating agents across multiple modalities. Source
Industry
Welcome to the AGI era of AI governance - Substantive essay examining AI governance challenges in a presumed AGI era. Source
Why AI hasn't replaced software engineers, and won't - Analysis of current and future limits of AI in software engineering roles. Source
Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model - Forensic analysis shows the claimed homegrown LLM is likely a 0.6/0.4 weight merge of Nex/Qwen, demonstrating the robustness of weight interpolation. Source
Community
Mistral launches recap and teases upcoming releases - Recap of Mistral Small 4, Medium 3.5, Voxtral STT/TTS, and Vibe, with hints at upcoming releases. Source
Open-source Knowledge Graph pipeline for LLM multi-hop reasoning - Django+React pipeline building knowledge graphs with community detection and hybrid retrieval to mitigate lost-in-the-middle issues in RAG. Source