The AI Wire

5101 articles — page 3 of 171

SePO: Self-Evolving Prompt Agent for System Prompt Optimization (huggingface.co)

2026-06-05|model|huggingface

SePO is an agent that automatically iterates and refines system prompts for LLMs by optimizing them through self-generated feedback without human intervention.

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination (huggingface.co)

2026-06-05|model|huggingface

A method scales reinforcement learning from verifiable rewards for code by decomposing complex programming problems into atomic sub-tasks and recombining them to synthesize new training examples.

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents (huggingface.co)

2026-06-05|model|huggingface

A policy optimization method equips LLM agents with a meta-cognitive memory mechanism that tracks and leverages past reasoning experiences to improve performance on long-horizon tasks.

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing (huggingface.co)

2026-06-05|model|huggingface

A benchmark evaluates image editing models across multiple dimensions specifically targeting whether edits are both visually correct and consistent with chain-of-thought reasoning requirements.

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management (huggingface.co)

2026-06-05|model|huggingface

EvoDS is an autonomous data science agent that improves over time by accumulating reusable skills and managing context efficiently to handle complex analytical workflows.

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation (huggingface.co)

2026-06-05|model|huggingface

Reinforcement learning training on multilingual tasks causes LLMs to generalize translation ability to previously unseen language pairs by learning to exploit contextual in-context cues.

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?(huggingface.co)

2026-06-05|model|huggingface

ArcANE investigates when and how well role-playing LLM agents maintain their assigned character identity versus appropriately breaking character in contextually sensitive situations.

Quality-Guided Semi-Supervised Learning for Medical Image Segmentation (huggingface.co)

2026-06-05|model|huggingface

A semi-supervised segmentation framework uses predicted quality scores to weight unlabeled medical images, giving higher training influence to more reliably pseudo-labeled samples.

RobotValues: Evaluating Household Robots When Human Values Conflict (huggingface.co)

2026-06-05|model|huggingface

A benchmark evaluates household robot decision-making in scenarios where competing human values create ethical conflicts with no single correct action.

Complexity-Balanced Diffusion Splitting (huggingface.co)

2026-06-05|model|huggingface

A diffusion-based generation method splits the denoising process into balanced complexity segments to improve computational efficiency and output quality.

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs (huggingface.co)

2026-06-05|model|huggingface

A code-switching ASR approach generalizes to language pairs unseen during training by learning language-agnostic switching representations transferable across multilingual combinations.

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints (huggingface.co)

2026-06-05|model|huggingface

AdaPlanBench measures LLM agents' ability to revise plans dynamically when world-state changes or user constraints shift mid-task.

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration (huggingface.co)

2026-06-05|model|huggingface

TIDE discovers multiple latent problems in text or systems by iteratively applying reusable problem templates to surface issues beyond those initially targeted.

AdaCodec: A Predictive Visual Code for Video MLLMs (huggingface.co)

2026-06-05|model|huggingface

AdaCodec generates adaptive visual token codes that predict future-frame relevance, enabling video multimodal LLMs to allocate representation capacity more efficiently.

Multimodal Music Recommendation System using LLMs (huggingface.co)

2026-06-05|model|huggingface

A music recommendation system fuses audio features, lyrics, and user context through LLMs to produce semantically grounded, multimodal song suggestions.

Quoting Emanuel Maiberg, 404 Media (simonwillison.net)

2026-06-05|news|blog/Simon Willison

A quote or statement from Emanuel Maiberg of 404 Media is being surfaced, likely offering journalism-grounded commentary on an AI-related topic.

AI enthusiasts are in a race against time, AI skeptics are in a race against entropy (simonwillison.net)

2026-06-05|news|blog/Simon Willison

An opinion piece contrasts AI enthusiasts racing to deploy capabilities before safety catches up with skeptics watching the whole effort degrade under its own contradictions.

Designing the hf CLI as an agent-optimized way to work with the Hub (huggingface.co)

2026-06-05|news|blog/Hugging Face Blog

Hugging Face redesigned its CLI so that autonomous agents can programmatically discover, upload, and manage Hub resources through structured, predictable commands.

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios (huggingface.co)

2026-06-05|news|blog/Hugging Face Blog

EVA-Bench Data 2.0 expands an evaluation suite to 3 domains, 121 tools, and 213 scenarios for testing AI agents on diverse real-world tasks.

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI (huggingface.co)

2026-06-05|news|blog/Hugging Face Blog

Nemotron 3.5 Content Safety delivers a customizable multimodal safety system designed for enterprise AI deployments across diverse global regulatory and cultural contexts.

Biodefense in the Intelligence Age (openai.com)

2026-06-05|news|blog/OpenAI Blog

Applies AI-driven intelligence analysis techniques to strengthen biological threat detection, attribution, and defensive response capabilities at national security scale.

Dreaming: Better memory for a more helpful ChatGPT (openai.com)

2026-06-05|news|blog/OpenAI Blog

ChatGPT gains a persistent memory architecture that retains user context across conversations, making responses more personalized and contextually relevant over time.

How Endava is redesigning software delivery around AI agents (openai.com)

2026-06-05|news|blog/OpenAI Blog

Endava restructures its software development lifecycle by delegating discrete engineering tasks to autonomous AI agents, reducing human bottlenecks in delivery pipelines.

Castor: CERN Advanced STORage Manager (castor.web.cern.ch)

2026-06-05|news|hackernews

CERN's Castor system provides a large-scale hierarchical storage management solution for archiving and retrieving the massive data volumes produced by particle physics experiments.

The Pentagon is running an AI propaganda mill targeting Latin America (theintercept.com)

2026-06-05|news|hackernews

The Pentagon operates AI-generated influence operations producing Spanish-language propaganda targeting Latin American populations to shape geopolitical narratives.

Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate (arxiv.org)

2026-06-05|news|hackernews

A post-training method internalizes multi-agent debate into a single model's latent space, enabling self-refinement without requiring multiple separate model instances at inference.

Go Experiments Explained (alexedwards.net)

2026-06-05|news|hackernews

Describes and analyzes specific AI experiments conducted in the game of Go, clarifying the design choices and outcomes behind notable research milestones.

Magenta RealTime 2: Open and Local Live Music Models (magenta.withgoogle.com)

2026-06-05|news|hackernews

Magenta RealTime 2 releases open, locally runnable generative music models capable of producing and responding to live musical input in real time.

Fine-tuning an LLM to write docs like it's 1995 (passo.uno)

2026-06-05|news|hackernews

Fine-tunes a large language model on vintage technical writing corpora to reproduce the terse, structured documentation style characteristic of 1990s software manuals.

Gemma 4 12B: A unified, encoder-free multimodal model (blog.google)

2026-06-04|news|hackernews

Gemma 4 12B processes both text and images within a single encoder-free architecture, unifying multimodal understanding without separate vision encoders.

← Prev3 / 171Next →