The AI Wire

181 articles tagged "ai" — page 2 of 7

2026-02-06|tool|GitHub

- Evaluating state of the art in AI

Fluid Representations in Reasoning Models [TOP LAB](arxiv.org)

2026-02-05|paper|arXiv

Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allow this sup...

cs-AI

Beyond Rewards in Reinforcement Learning for Cyber Defence [TOP LAB](arxiv.org)

2026-02-05|paper|arXiv

Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gy...

cs-LG cs-AI

Protein Autoregressive Modeling via Multiscale Structure Generation (arxiv.org)

2026-02-05|paper|arXiv

We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature...

cs-LG cs-AI q-bio-BM

Contrastive Continual Learning for Model Adaptability in Internet of Things (arxiv.org)

2026-02-05|paper|arXiv

Internet of Things (IoT) deployments operate in nonstationary, dynamic environments where factors such as sensor drift, evolving user behavior, and heterogeneous user privacy requirements can affect a...

cs-LG cs-AI

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [TOP LAB](arxiv.org)

2026-02-04|paper|arXiv

Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to co...

cs-CL cs-AI

Equilibrium Propagation for Non-Conservative Systems [TOP LAB](arxiv.org)

2026-02-04|paper|arXiv

Equilibrium Propagation (EP) is a physics-inspired learning algorithm that uses stationary states of a dynamical system both for inference and learning. In its original formulation it is limited to co...

cs-LG cs-AI cs-NE

PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning (arxiv.org)

2026-02-04|paper|arXiv

We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributi...

cs-LG cs-AI

MentisOculi: Revealing the Limits of Reasoning with Mental Imagery [TOP LAB](arxiv.org)

2026-02-03|paper|arXiv

Frontier models are transitioning from multimodal large language models (MLLMs) that merely ingest visual information to unified multimodal models (UMMs) capable of native interleaved generation. This...

cs-AI cs-CV cs-LG

Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning [TOP LAB](arxiv.org)

2026-02-03|paper|arXiv

Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or on the existence of a stronger mode...

cs-LG cs-AI

Reward-free Alignment for Conflicting Objectives (arxiv.org)

2026-02-03|paper|arXiv

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where ...

cs-CL cs-AI cs-LG

WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

Wireless ethical hacking relies heavily on skilled practitioners manually interpreting reconnaissance results and executing complex, time-sensitive sequences of commands to identify vulnerable targets...

cs-CR cs-AI

Chain-of-thought obfuscation learned from output supervision can generalise to unseen tasks [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

Chain-of-thought (CoT) reasoning provides a significant performance uplift to LLMs by enabling planning, exploration, and deliberation of their actions. CoT is also a powerful tool for monitoring the ...

cs-AI

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how ext...

cs-AI

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation (arxiv.org)

2026-02-02|paper|arXiv

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drif...

cs-CV cs-AI cs-LG

json_repair (github.com)

2026-02-02|tool|GitHub

A python module to repair invalid JSON from LLMs...

gpt-4 json llm parser

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows create cascading effects across intercon...

cs-AI cs-SE

Investigating Associational Biases in Inter-Model Communication of Large Generative Models [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demogra...

cs-CY cs-AI

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-ca...

cs-AI

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-02-01|paper|arXiv

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models l...

cs-CR cs-AI cs-CL

Agent Benchmarks Fail Public Sector Requirements [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Deploying Large Language Model-based agents (LLM agents) in the public sector requires assuring that they meet the stringent legal, procedural, and structural requirements of public-sector institution...

cs-CY cs-AI

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs (arxiv.org)

2026-01-30|paper|arXiv

One of the biggest missing capabilities in current AI systems is the ability to learn continuously after deployment. Implementing such continually learning systems have several challenges, one of whic...

cs-LG cs-AI cs-CL

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-AI cs-SE

Investigating Associational Biases in Inter-Model Communication of Large Generative Models [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-CY cs-AI

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-AI

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-01-30|paper|arXiv

cs-CR cs-AI cs-CL

LVLMs and Humans Ground Differently in Referential Communication [TOP LAB](arxiv.org)

2026-01-29|paper|arXiv

For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an ...

cs-CL cs-AI cs-HC

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration [TOP LAB](arxiv.org)

2026-01-28|paper|arXiv

Reinforcement learning (RL) has improved the reasoning abilities of large language models (LLMs), yet state-of-the-art methods still fail to learn on many training problems. On hard problems, on-polic...

cs-LG cs-AI cs-CL

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models (arxiv.org)

2026-01-28|paper|arXiv

Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and ...

cs-CL cs-AI cs-LG

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes (arxiv.org)

2026-01-28|paper|arXiv

Typical reinforcement learning (RL) methods for LLM reasoning waste compute on hard problems, where correct on-policy traces are rare, policy gradients vanish, and learning stalls. To bootstrap more e...

cs-LG cs-AI cs-CL

← Prev2 / 7Next →