The AI Wire

72 articles tagged "cs-CL" — page 2 of 3

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep (arxiv.org)

2026-01-29|paper|arXiv

Large language model (LLM) scaling is hitting a wall. Widening models yields diminishing returns, and extending context length does not improve fundamental expressivity. In contrast, depth scaling off...

cs-LG cs-CL

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration [TOP LAB](arxiv.org)

2026-01-28|paper|arXiv

Reinforcement learning (RL) has improved the reasoning abilities of large language models (LLMs), yet state-of-the-art methods still fail to learn on many training problems. On hard problems, on-polic...

cs-LG cs-AI cs-CL

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models (arxiv.org)

2026-01-28|paper|arXiv

Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and ...

cs-CL cs-AI cs-LG

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes (arxiv.org)

2026-01-28|paper|arXiv

Typical reinforcement learning (RL) methods for LLM reasoning waste compute on hard problems, where correct on-policy traces are rare, policy gradients vanish, and learning stalls. To bootstrap more e...

cs-LG cs-AI cs-CL

Subword-Based Comparative Linguistics across 242 Languages Using Wikipedia Glottosets (arxiv.org)

2026-01-28|paper|arXiv

We present a large-scale comparative study of 242 Latin and Cyrillic-script languages using subword-based methodologies. By constructing 'glottosets' from Wikipedia lexicons, we introduce a framework ...

cs-CL cs-AI cs-LG

MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts (arxiv.org)

2026-01-28|paper|arXiv

Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation cre...

cs-CL

INDIC DIALECT: A Multi Task Benchmark to Evaluate and Translate in Indian Language Dialects [TOP LAB](arxiv.org)

2026-01-16|paper|arXiv

Recent NLP advances focus primarily on standardized languages, leaving most low-resource dialects under-served especially in Indian scenarios. In India, the issue is particularly important: despite Hi...

cs-CL

Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering [TOP LAB](arxiv.org)

2026-01-15|paper|arXiv

Autonomous systems conducting schema-grounded information-gathering dialogues face an instrumentation gap, lacking turn-level observables for monitoring acquisition efficiency and detecting when quest...

cs-CL

Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation [TOP LAB](arxiv.org)

2026-01-14|paper|arXiv

Data curation is a critical yet under-researched step in the machine translation training paradigm. To train translation systems, data acquisition relies primarily on human translations and digital pa...

cs-CL

Kinship Data Benchmark for Multi-hop Reasoning [TOP LAB](arxiv.org)

2026-01-13|paper|arXiv

Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introduce Kinship...

cs-CL cs-AI

Order in the Evaluation Court: A Critical Analysis of NLG Evaluation Trends [TOP LAB](arxiv.org)

2026-01-13|paper|arXiv

Despite advances in Natural Language Generation (NLG), evaluation remains challenging. Although various new metrics and LLM-as-a-judge (LaaJ) methods are proposed, human judgment persists as the gold ...

cs-CL

AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs (arxiv.org)

2026-01-12|paper|arXiv

Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical w...

cs-CL cs-AI

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards (arxiv.org)

2026-01-12|paper|arXiv

Reinforcement learning (RL) has emerged as a critical technique for enhancing LLM-based deep search agents. However, existing approaches primarily rely on binary outcome rewards, which fail to capture...

cs-CL

Memory Bank Compression for Continual Adaptation of Large Language Models [TOP LAB](arxiv.org)

2026-01-05|paper|arXiv

Large Language Models (LLMs) have become a mainstay for many everyday applications. However, as data evolve their knowledge quickly becomes outdated. Continual learning aims to update LLMs with new in...

cs-LG cs-CL

Large language models and the entropy of English [TOP LAB](arxiv.org)

2026-01-04|paper|arXiv

We use large language models (LLMs) to uncover long-ranged structure in English texts from a variety of sources. The conditional entropy or code length in many cases continues to decrease with context...

cond-mat-stat-mech cs-CL physics-bio-ph

Large language models and the entropy of English [TOP LAB](arxiv.org)

2026-01-03|paper|arXiv

cond-mat-stat-mech cs-CL physics-bio-ph

Large language models and the entropy of English [TOP LAB](arxiv.org)

2026-01-02|paper|arXiv

cond-mat-stat-mech cs-CL physics-bio-ph

Large language models and the entropy of English [TOP LAB](arxiv.org)

2026-01-01|paper|arXiv

cond-mat-stat-mech cs-CL physics-bio-ph

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs [TOP LAB](arxiv.org)

2025-12-21|paper|arXiv

Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decompose the t...

cs-CL cs-AI

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs [TOP LAB](arxiv.org)

2025-12-20|paper|arXiv

cs-CL cs-AI

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs [TOP LAB](arxiv.org)

2025-12-19|paper|arXiv

<think>

cs-CL cs-AI

Beyond surface form: A pipeline for semantic analysis in Alzheimer's Disease detection from spontaneous speech [TOP LAB](arxiv.org)

2025-12-16|paper|arXiv

<think>

cs-CL