The AI Wire

513 articles tagged "c" — page 6 of 18

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Current generative video models excel at producing novel content from text and image prompts, but leave a critical gap in editing existing pre-recorded videos, where minor alterations to the spoken sc...

cs-CV cs-GR cs-LG

Investigating Associational Biases in Inter-Model Communication of Large Generative Models [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demogra...

cs-CY cs-AI

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty [TOP LAB](arxiv.org)

2026-01-30|paper|arXiv

Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-ca...

cs-AI

RedSage: A Cybersecurity Generalist LLM (arxiv.org)

2026-01-30|paper|arXiv

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models l...

cs-CR cs-AI cs-CL

LVLMs and Humans Ground Differently in Referential Communication [TOP LAB](arxiv.org)

2026-01-29|paper|arXiv

For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an ...

cs-CL cs-AI cs-HC

Evaluation of Oncotimia: An LLM based system for supporting tumour boards (arxiv.org)

2026-01-29|paper|arXiv

Multidisciplinary tumour boards (MDTBs) play a central role in oncology decision-making but require manual processes and structuring large volumes of heterogeneous clinical information, resulting in a...

cs-CL

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding (arxiv.org)

2026-01-29|paper|arXiv

Arabic calligraphy represents one of the richest visual traditions of the Arabic language, blending linguistic meaning with artistic form. Although multimodal models have advanced across languages, th...

cs-CV

Self-Distillation Enables Continual Learning (arxiv.org)

2026-01-29|paper|arXiv

Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement le...

cs-LG

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep (arxiv.org)

2026-01-29|paper|arXiv

Large language model (LLM) scaling is hitting a wall. Widening models yields diminishing returns, and extending context length does not improve fundamental expressivity. In contrast, depth scaling off...

cs-LG cs-CL

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration [TOP LAB](arxiv.org)

2026-01-28|paper|arXiv

Reinforcement learning (RL) has improved the reasoning abilities of large language models (LLMs), yet state-of-the-art methods still fail to learn on many training problems. On hard problems, on-polic...

cs-LG cs-AI cs-CL

ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models (arxiv.org)

2026-01-28|paper|arXiv

Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and ...

cs-CL cs-AI cs-LG

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes (arxiv.org)

2026-01-28|paper|arXiv

Typical reinforcement learning (RL) methods for LLM reasoning waste compute on hard problems, where correct on-policy traces are rare, policy gradients vanish, and learning stalls. To bootstrap more e...

cs-LG cs-AI cs-CL

Subword-Based Comparative Linguistics across 242 Languages Using Wikipedia Glottosets (arxiv.org)

2026-01-28|paper|arXiv

We present a large-scale comparative study of 242 Latin and Cyrillic-script languages using subword-based methodologies. By constructing 'glottosets' from Wikipedia lexicons, we introduce a framework ...

cs-CL cs-AI cs-LG

MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts (arxiv.org)

2026-01-28|paper|arXiv

Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation cre...

cs-CL

physicsnemo (github.com)

2026-01-28|tool|GitHub

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods...

deep-learning machine-learning nvidia-gpu physics

Scalable Algorithms for Approximate DNF Model Counting [TOP LAB](arxiv.org)

2026-01-16|paper|arXiv

Model counting of Disjunctive Normal Form (DNF) formulas is a critical problem in applications such as probabilistic inference and network reliability. For example, it is often used for query evaluati...

cs-DS cs-AI

Lunar-G2R: Geometry-to-Reflectance Learning for High-Fidelity Lunar BRDF Estimation [TOP LAB](arxiv.org)

2026-01-16|paper|arXiv

We address the problem of estimating realistic, spatially varying reflectance for complex planetary surfaces such as the lunar regolith, which is critical for high-fidelity rendering and vision-based ...

cs-CV

INDIC DIALECT: A Multi Task Benchmark to Evaluate and Translate in Indian Language Dialects [TOP LAB](arxiv.org)

2026-01-16|paper|arXiv

Recent NLP advances focus primarily on standardized languages, leaving most low-resource dialects under-served especially in Indian scenarios. In India, the issue is particularly important: despite Hi...

cs-CL

WildRayZer: Self-supervised Large View Synthesis in Dynamic Environments (arxiv.org)

2026-01-16|paper|arXiv

We present WildRayZer, a self-supervised framework for novel view synthesis (NVS) in dynamic environments where both the camera and objects move. Dynamic content breaks the multi-view consistency that...

cs-CV

DInf-Grid: A Neural Differential Equation Solver with Differentiable Feature Grids (arxiv.org)

2026-01-16|paper|arXiv

We present a novel differentiable grid-based representation for efficiently solving differential equations (DEs). Widely used architectures for neural solvers, such as sinusoidal neural networks, are ...

cs-LG

STEP3-VL-10B Technical Report [TOP LAB](arxiv.org)

2026-01-15|paper|arXiv

We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized t...

cs-CV

Exploring Fine-Tuning for Tabular Foundation Models [TOP LAB](arxiv.org)

2026-01-15|paper|arXiv

Tabular Foundation Models (TFMs) have recently shown strong in-context learning capabilities on structured data, achieving zero-shot performance comparable to traditional machine learning methods. We ...

cs-LG

Information Access of the Oppressed: A Problem-Posing Framework for Envisioning Emancipatory Information Access Platforms [TOP LAB](arxiv.org)

2026-01-15|paper|arXiv

Online information access (IA) platforms are targets of authoritarian capture. These concerns are particularly serious and urgent today in light of the rising levels of democratic erosion worldwide, t...

cs-CY cs-AI cs-HC

Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering [TOP LAB](arxiv.org)

2026-01-15|paper|arXiv

Autonomous systems conducting schema-grounded information-gathering dialogues face an instrumentation gap, lacking turn-level observables for monitoring acquisition efficiency and detecting when quest...

cs-CL

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning (arxiv.org)

2026-01-15|paper|arXiv

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-...

cs-CV cs-AI cs-LG

Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation [TOP LAB](arxiv.org)

2026-01-14|paper|arXiv

Scene Graph Generation (SGG) suffers from a long-tailed distribution, where a few predicate classes dominate while many others are underrepresented, leading to biased models that underperform on rare ...

cs-CV

Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation [TOP LAB](arxiv.org)

2026-01-14|paper|arXiv

Data curation is a critical yet under-researched step in the machine translation training paradigm. To train translation systems, data acquisition relies primarily on human translations and digital pa...

cs-CL

Accelerated Methods with Complexity Separation Under Data Similarity for Federated Learning Problems [TOP LAB](arxiv.org)

2026-01-14|paper|arXiv

Heterogeneity within data distribution poses a challenge in many modern federated learning tasks. We formalize it as an optimization problem involving a computationally heavy composite under data simi...

math-OC cs-LG

RAVEN: Erasing Invisible Watermarks via Novel View Synthesis (arxiv.org)

2026-01-14|paper|arXiv

Invisible watermarking has become a critical mechanism for authenticating AI-generated image content, with major platforms deploying watermarking schemes at scale. However, evaluating the vulnerabilit...

cs-CV

3AM: Segment Anything with Geometric Consistency in Videos (arxiv.org)

2026-01-14|paper|arXiv

Video object segmentation methods like SAM2 achieve strong performance through memory-based architectures but struggle under large viewpoint changes due to reliance on appearance features. Traditional...

cs-CV

← Prev6 / 18Next →