The AI Wire

146 articles tagged "cs-AI" — page 1 of 5

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks [TOP LAB](arxiv.org)

2026-02-27|paper|arXiv

Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only r...

cs-AI cs-CL cs-CR

Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models [TOP LAB](arxiv.org)

2026-02-25|paper|arXiv

Scaling multimodal alignment between video and audio is challenging, particularly due to limited data and the mismatch between text descriptions and frame-level video information. In this work, we tac...

cs-CV cs-AI

Airavat: An Agentic Framework for Internet Measurement [TOP LAB](arxiv.org)

2026-02-25|paper|arXiv

Internet measurement faces twin challenges: complex analyses require expert-level orchestration of tools, yet even syntactically correct implementations can have methodological flaws and can be diffic...

cs-NI cs-AI cs-SE

Test-Time Training with KV Binding Is Secretly Linear Attention (arxiv.org)

2026-02-25|paper|arXiv

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis rev...

cs-LG cs-AI cs-CV

A Very Big Video Reasoning Suite (arxiv.org)

2026-02-24|paper|arXiv

Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual env...

cs-CV cs-AI cs-LG

Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System [TOP LAB](arxiv.org)

2026-02-23|paper|arXiv

In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlog c...

cs-CL cs-AI

Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery [TOP LAB](arxiv.org)

2026-02-20|paper|arXiv

In many real-world settings, such as environmental monitoring, disaster response, or public health, with costly and difficult data collection and dynamic environments, strategically sampling from unob...

cs-CV cs-AI cs-CY

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability [TOP LAB](arxiv.org)

2026-02-20|paper|arXiv

In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly f...

cs-AI cs-CL cs-IR

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data [TOP LAB](arxiv.org)

2026-02-20|paper|arXiv

Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet u...

cs-HC cs-AI cs-CL

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes [TOP LAB](arxiv.org)

2026-02-19|paper|arXiv

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a ...

cs-LG cs-AI

DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows [TOP LAB](arxiv.org)

2026-02-19|paper|arXiv

Operational rigor determines whether human-agent collaboration succeeds or fails. Scientific data pipelines need the equivalent of DevOps -- SciOps -- yet common approaches fragment provenance across ...

cs-DB cs-AI

Interpretability-by-Design with Accurate Locally Additive Models and Conditional Feature Effects [TOP LAB](arxiv.org)

2026-02-19|paper|arXiv

Generalized additive models (GAMs) offer interpretability through independent univariate feature effects but underfit when interactions are present in data. GA$^2$Ms add selected pairwise interactions...

cs-LG cs-AI

Lifelong Scalable Multi-Agent Realistic Testbed and A Comprehensive Study on Design Choices in Lifelong AGV Fleet Management Systems [TOP LAB](arxiv.org)

2026-02-18|paper|arXiv

We present Lifelong Scalable Multi-Agent Realistic Testbed (LSMART), an open-source simulator to evaluate any Multi-Agent Path Finding (MAPF) algorithm in a Fleet Management System (FMS) with Automate...

cs-RO cs-AI

Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment (arxiv.org)

2026-02-13|paper|arXiv

The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA) models have made remarkable progress t...

cs-RO cs-AI eess-SY

GameDevBench: Evaluating Agentic Capabilities Through Game Development [TOP LAB](arxiv.org)

2026-02-12|paper|arXiv

Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software dev...

cs-AI cs-CL cs-SE

ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression [TOP LAB](arxiv.org)

2026-02-12|paper|arXiv

We present ROCKET, a training-free model compression method that achieves state-of-the-art performance in comparison with factorization, structured-sparsification and dynamic compression baselines. Op...

cs-LG cs-AI cs-CL

A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models [TOP LAB](arxiv.org)

2026-02-11|paper|arXiv

How can children acquire native-level syntax from limited input? According to the Poverty of the Stimulus Hypothesis (PoSH), the linguistic input children receive is insufficient to explain certain ge...

cs-CL cs-AI

Biases in the Blind Spot: Detecting What LLMs Fail to Mention (arxiv.org)

2026-02-11|paper|arXiv

Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their...

cs-LG cs-AI

CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute [TOP LAB](arxiv.org)

2026-02-10|paper|arXiv

Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce CoRefine, a...

cs-AI cs-CL

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps [TOP LAB](arxiv.org)

2026-02-06|paper|arXiv

Flow and diffusion models produce high-quality samples, but adapting them to user preferences or constraints post-training remains costly and brittle, a challenge commonly called reward alignment. We ...

cs-LG cs-AI

Shared LoRA Subspaces for almost Strict Continual Learning (arxiv.org)

2026-02-06|paper|arXiv

Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment but remains challenging due to catastrophic forgetting and the high cost of retraining. W...

cs-LG cs-AI cs-CV

Fluid Representations in Reasoning Models [TOP LAB](arxiv.org)

2026-02-05|paper|arXiv

Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allow this sup...

cs-AI

Beyond Rewards in Reinforcement Learning for Cyber Defence [TOP LAB](arxiv.org)

2026-02-05|paper|arXiv

Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gy...

cs-LG cs-AI

Protein Autoregressive Modeling via Multiscale Structure Generation (arxiv.org)

2026-02-05|paper|arXiv

We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature...

cs-LG cs-AI q-bio-BM

Contrastive Continual Learning for Model Adaptability in Internet of Things (arxiv.org)

2026-02-05|paper|arXiv

Internet of Things (IoT) deployments operate in nonstationary, dynamic environments where factors such as sensor drift, evolving user behavior, and heterogeneous user privacy requirements can affect a...

cs-LG cs-AI

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [TOP LAB](arxiv.org)

2026-02-04|paper|arXiv

Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to co...

cs-CL cs-AI

Equilibrium Propagation for Non-Conservative Systems [TOP LAB](arxiv.org)

2026-02-04|paper|arXiv

Equilibrium Propagation (EP) is a physics-inspired learning algorithm that uses stationary states of a dynamical system both for inference and learning. In its original formulation it is limited to co...

cs-LG cs-AI cs-NE

PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning (arxiv.org)

2026-02-04|paper|arXiv

We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributi...

cs-LG cs-AI

MentisOculi: Revealing the Limits of Reasoning with Mental Imagery [TOP LAB](arxiv.org)

2026-02-03|paper|arXiv

Frontier models are transitioning from multimodal large language models (MLLMs) that merely ingest visual information to unified multimodal models (UMMs) capable of native interleaved generation. This...

cs-AI cs-CV cs-LG

Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning [TOP LAB](arxiv.org)

2026-02-03|paper|arXiv

Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or on the existence of a stronger mode...

cs-LG cs-AI

1 / 5Next →