The AI Wire

146 articles tagged "cs-AI" — page 2 of 5

Reward-free Alignment for Conflicting Objectives (arxiv.org)

2026-02-03|paper|arXiv

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where ...

cs-CL cs-AI cs-LG

WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

Wireless ethical hacking relies heavily on skilled practitioners manually interpreting reconnaissance results and executing complex, time-sensitive sequences of commands to identify vulnerable targets...

cs-CR cs-AI

Chain-of-thought obfuscation learned from output supervision can generalise to unseen tasks [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

Chain-of-thought (CoT) reasoning provides a significant performance uplift to LLMs by enabling planning, exploration, and deliberation of their actions. CoT is also a powerful tool for monitoring the ...

cs-AI

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how ext...

cs-AI

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation (arxiv.org)

2026-02-02|paper|arXiv

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drif...

cs-CV cs-AI cs-LG

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Frontier large language models (LLMs) excel as autonomous agents in many domains, yet they remain untested in complex enterprise systems where hidden workflows create cascading effects across intercon...

cs-AI cs-SE

Investigating Associational Biases in Inter-Model Communication of Large Generative Models [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demogra...

cs-CY cs-AI

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty [TOP LAB](arxiv.org)

2026-02-01|paper|arXiv

Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-ca...

cs-AI

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-02-01|paper|arXiv

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models l...

cs-CR cs-AI cs-CL

Agent Benchmarks Fail Public Sector Requirements [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Deploying Large Language Model-based agents (LLM agents) in the public sector requires assuring that they meet the stringent legal, procedural, and structural requirements of public-sector institution...

cs-CY cs-AI

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs (arxiv.org)

2026-01-30|paper|arXiv

One of the biggest missing capabilities in current AI systems is the ability to learn continuously after deployment. Implementing such continually learning systems have several challenges, one of whic...

cs-LG cs-AI cs-CL

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-AI cs-SE

Investigating Associational Biases in Inter-Model Communication of Large Generative Models [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-CY cs-AI

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

cs-AI

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-01-30|paper|arXiv

cs-CR cs-AI cs-CL

LVLMs and Humans Ground Differently in Referential Communication [TOP LAB](arxiv.org)

2026-01-29|paper|arXiv

For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an ...

cs-CL cs-AI cs-HC

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration [TOP LAB](arxiv.org)

2026-01-28|paper|arXiv

Reinforcement learning (RL) has improved the reasoning abilities of large language models (LLMs), yet state-of-the-art methods still fail to learn on many training problems. On hard problems, on-polic...

cs-LG cs-AI cs-CL

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models (arxiv.org)

2026-01-28|paper|arXiv

Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and ...

cs-CL cs-AI cs-LG

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes (arxiv.org)

2026-01-28|paper|arXiv

Typical reinforcement learning (RL) methods for LLM reasoning waste compute on hard problems, where correct on-policy traces are rare, policy gradients vanish, and learning stalls. To bootstrap more e...

cs-LG cs-AI cs-CL

Subword-Based Comparative Linguistics across 242 Languages Using Wikipedia Glottosets (arxiv.org)

2026-01-28|paper|arXiv

We present a large-scale comparative study of 242 Latin and Cyrillic-script languages using subword-based methodologies. By constructing 'glottosets' from Wikipedia lexicons, we introduce a framework ...

cs-CL cs-AI cs-LG

Scalable Algorithms for Approximate DNF Model Counting [TOP LAB](arxiv.org)

2026-01-16|paper|arXiv

Model counting of Disjunctive Normal Form (DNF) formulas is a critical problem in applications such as probabilistic inference and network reliability. For example, it is often used for query evaluati...

cs-DS cs-AI

Information Access of the Oppressed: A Problem-Posing Framework for Envisioning Emancipatory Information Access Platforms [TOP LAB](arxiv.org)

2026-01-15|paper|arXiv

Online information access (IA) platforms are targets of authoritarian capture. These concerns are particularly serious and urgent today in light of the rising levels of democratic erosion worldwide, t...

cs-CY cs-AI cs-HC

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning (arxiv.org)

2026-01-15|paper|arXiv

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-...

cs-CV cs-AI cs-LG

Kinship Data Benchmark for Multi-hop Reasoning [TOP LAB](arxiv.org)

2026-01-13|paper|arXiv

Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introduce Kinship...

cs-CL cs-AI

AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs (arxiv.org)

2026-01-12|paper|arXiv

Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical w...

cs-CL cs-AI

An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions [TOP LAB](arxiv.org)

2026-01-11|paper|arXiv

We investigate how large language models (LLMs) fail when tabular data in an otherwise canonical representation is subjected to semantic and structural distortions. Our findings reveal that LLMs lack ...

cs-AI

On the Definition and Detection of Cherry-Picking in Counterfactual Explanations [TOP LAB](arxiv.org)

2026-01-11|paper|arXiv

Counterfactual explanations are widely used to communicate how inputs must change for a model to alter its prediction. For a single instance, many valid counterfactuals can exist, which leaves open th...

cs-LG cs-AI

DatBench: Discriminative, Faithful, and Efficient VLM Evaluations [TOP LAB](arxiv.org)

2026-01-06|paper|arXiv

Empirical evaluation serves as the primary compass guiding research progress in foundation models. Despite a large body of work focused on training frontier vision-language models (VLMs), approaches t...

cs-LG cs-AI

BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models [TOP LAB](arxiv.org)

2026-01-06|paper|arXiv

Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across visual and textual modalities. Existing debiasing appro...

cs-CV cs-AI cs-LG

FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing [TOP LAB](arxiv.org)

2026-01-05|paper|arXiv

Federated data sharing promises utility without centralizing raw data, yet existing embedding-level generators struggle under non-IID client heterogeneity and provide limited formal protection against...

cs-LG cs-AI cs-CV

← Prev2 / 5Next →