The AI Wire

72 articles tagged "cs-CL" — page 1 of 3

Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning [TOP LAB](arxiv.org)

2026-02-27|paper|arXiv

The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior stems from a reporting bias in their training data....

cs-CL cs-CV

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks [TOP LAB](arxiv.org)

2026-02-27|paper|arXiv

Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only r...

cs-AI cs-CL cs-CR

LiCQA : A Lightweight Complex Question Answering System [TOP LAB](arxiv.org)

2026-02-26|paper|arXiv

Over the last twenty years, significant progress has been made in designing and implementing Question Answering (QA) systems. However, addressing complex questions, the answers to which are spread acr...

cs-CL cs-IR

Dynamic Personality Adaptation in Large Language Models via State Machines [TOP LAB](arxiv.org)

2026-02-26|paper|arXiv

The inability of Large Language Models (LLMs) to modulate their personality expression in response to evolving dialogue dynamics hinders their performance in complex, interactive contexts. We propose ...

cs-CL cs-HC cs-LG

BabyLM Turns 4: Call for Papers for the 2026 BabyLM Workshop [TOP LAB](arxiv.org)

2026-02-24|paper|arXiv

BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 4th BabyLM competition. As in previous years, ...

cs-CL

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning [TOP LAB](arxiv.org)

2026-02-23|paper|arXiv

Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich so...

cs-CL cs-IR

Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System [TOP LAB](arxiv.org)

2026-02-23|paper|arXiv

In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlog c...

cs-CL cs-AI

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability [TOP LAB](arxiv.org)

2026-02-20|paper|arXiv

In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowly f...

cs-AI cs-CL cs-IR

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data [TOP LAB](arxiv.org)

2026-02-20|paper|arXiv

Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet u...

cs-HC cs-AI cs-CL

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization [TOP LAB](arxiv.org)

2026-02-13|paper|arXiv

Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by th...

cs-CL cs-LG

GameDevBench: Evaluating Agentic Capabilities Through Game Development [TOP LAB](arxiv.org)

2026-02-12|paper|arXiv

Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software dev...

cs-AI cs-CL cs-SE

ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression [TOP LAB](arxiv.org)

2026-02-12|paper|arXiv

We present ROCKET, a training-free model compression method that achieves state-of-the-art performance in comparison with factorization, structured-sparsification and dynamic compression baselines. Op...

cs-LG cs-AI cs-CL

A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models [TOP LAB](arxiv.org)

2026-02-11|paper|arXiv

How can children acquire native-level syntax from limited input? According to the Poverty of the Stimulus Hypothesis (PoSH), the linguistic input children receive is insufficient to explain certain ge...

cs-CL cs-AI

CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute [TOP LAB](arxiv.org)

2026-02-10|paper|arXiv

Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce CoRefine, a...

cs-AI cs-CL

Visual Word Sense Disambiguation with CLIP through Dual-Channel Text Prompting and Image Augmentations [TOP LAB](arxiv.org)

2026-02-09|paper|arXiv

Ambiguity poses persistent challenges in natural language understanding for large language models (LLMs). To better understand how lexical ambiguity can be resolved through the visual domain, we devel...

cs-CL

R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging [TOP LAB](arxiv.org)

2026-02-09|paper|arXiv

Reinforcement Learning from Human Feedback (RLHF) remains indispensable for aligning large language models (LLMs) in subjective domains. To enhance robustness, recent work shifts toward Generative Rew...

cs-CL

A Systematic Evaluation of Large Language Models for PTSD Severity Estimation: The Role of Contextual Knowledge and Modeling Strategies [TOP LAB](arxiv.org)

2026-02-06|paper|arXiv

Large language models (LLMs) are increasingly being used in a zero-shot fashion to assess mental health conditions, yet we have limited knowledge on what factors affect their accuracy. In this study, ...

cs-CL

Reinforced Attention Learning (arxiv.org)

2026-02-05|paper|arXiv

Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) t...

cs-CL cs-CV cs-LG

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [TOP LAB](arxiv.org)

2026-02-04|paper|arXiv

Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to co...

cs-CL cs-AI

Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing (arxiv.org)

2026-02-04|paper|arXiv

Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and l...

cs-CL

Misconception Diagnosis From Student-Tutor Dialogue: Generate, Retrieve, Rerank [TOP LAB](arxiv.org)

2026-02-03|paper|arXiv

Timely and accurate identification of student misconceptions is key to improving learning outcomes and pre-empting the compounding of student errors. However, this task is highly dependent on the effo...

cs-CL cs-LG

Reward-free Alignment for Conflicting Objectives (arxiv.org)

2026-02-03|paper|arXiv

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where ...

cs-CL cs-AI cs-LG

Are you going to finish that? A Practical Study of the Tokenization Boundary Problem [TOP LAB](arxiv.org)

2026-02-02|paper|arXiv

Language models (LMs) are trained over sequences of tokens, whereas users interact with LMs via text. This mismatch gives rise to the partial token problem, which occurs when a user ends their prompt ...

cs-CL

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-02-01|paper|arXiv

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models l...

cs-CR cs-AI cs-CL

SERA: Soft-Verified Efficient Repository Agents [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Open-weight coding agents should hold a fundamental advantage over closed-source systems: they can be specialized to private codebases, encoding repository-specific information directly in their weigh...

cs-CL cs-LG cs-SE

Persona Prompting as a Lens on LLM Social Reasoning [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

For socially sensitive tasks like hate speech detection, the quality of explanations from Large Language Models (LLMs) is crucial for factors like user trust and model alignment. While Persona prompti...

cs-CL

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs (arxiv.org)

2026-01-30|paper|arXiv

One of the biggest missing capabilities in current AI systems is the ability to learn continuously after deployment. Implementing such continually learning systems have several challenges, one of whic...

cs-LG cs-AI cs-CL

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-01-30|paper|arXiv

cs-CR cs-AI cs-CL

LVLMs and Humans Ground Differently in Referential Communication [TOP LAB](arxiv.org)

2026-01-29|paper|arXiv

For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an ...

cs-CL cs-AI cs-HC

Evaluation of Oncotimia: An LLM based system for supporting tumour boards (arxiv.org)

2026-01-29|paper|arXiv

Multidisciplinary tumour boards (MDTBs) play a central role in oncology decision-making but require manual processes and structuring large volumes of heterogeneous clinical information, resulting in a...

cs-CL

1 / 3Next →