The AI Wire

5155 articles — page 21 of 172

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning (arxiv.org)

2026-05-28|paper|arxiv

A training method uses contrastive comparison between correct and incorrect reasoning traces to rapidly steer models toward better multi-step reasoning without extensive data.

Bias Leaves a Gradient Trail: Label-Free Bias Identification via Gradient Probes on Concept Decompositions (arxiv.org)

2026-05-28|paper|arxiv

Gradient signals from concept-decomposed representations are used as probes to identify dataset biases without requiring any bias labels.

browser-use/browser-use (95923 stars): 🌐 Make websites accessible for AI agents. Automate tasks online with ease.(github.com)

2026-05-28|tool|github

browser-use provides a library enabling AI agents to control and automate real browser interactions with websites.

firecrawl/firecrawl (125389 stars): 🔥 Search, scrape, and clean the web for AI agents.(github.com)

2026-05-28|tool|github

Firecrawl provides an API and toolset to search, scrape, and structure web content into clean data for AI agents.

langgenius/dify (142941 stars): Production-ready platform for agentic workflow development.(github.com)

2026-05-28|tool|github

Dify enables developers to build, deploy, and monitor LLM-powered agentic workflows in production environments with a visual development platform.

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving (huggingface.co)

2026-05-28|model|huggingface

Applies block-level diffusion within a vision-language model for autonomous driving to achieve faster inference while maintaining high-quality scene understanding and planning.

Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)(huggingface.co)

2026-05-28|model|huggingface

Introduces Word Coverage Score (WCS) to measure how many lexical tokens an LLM can actually generate under sampling, revealing vocabulary blind spots.

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL (arxiv.org)

2026-05-28|paper|arxiv

Extrapolating beyond averaged weight checkpoints in code-generation RL training exposes Pareto frontiers trading off solution correctness against computational efficiency.

CubePart: An Open-Vocabulary Part-Controllable 3D Generator (arxiv.org)

2026-05-28|paper|arxiv

An open-vocabulary 3D generative model lets users specify arbitrary part-level semantic controls to synthesize and manipulate distinct components of 3D objects.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning (arxiv.org)

2026-05-28|paper|arxiv

A policy optimization method trains multimodal agents through exploratory interaction, improving agentic reasoning across visual and textual decision-making tasks.

Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents (arxiv.org)

2026-05-28|paper|arxiv

An automated pipeline identifies weaknesses in small computer-use agents and generates domain-specific training data to improve performance in those failure areas.

Skill-Conditioned Gated Self-Distillation for LLM Reasoning (arxiv.org)

2026-05-28|paper|arxiv

A self-distillation method conditions LLM training on identified skill categories, selectively reinforcing reasoning capabilities where the model shows specific weaknesses.

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models (arxiv.org)

2026-05-28|paper|arxiv

Applies causal state space models to perform low-latency, continuous EEG signal decoding directly from streaming brain activity without future context.

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization (arxiv.org)

2026-05-28|paper|arxiv

Trains separate explanation models per annotator by treating disagreements among human labelers as meaningful signal via cross-annotator preference optimization.

Calibrating Conservatism for Scalable Oversight (arxiv.org)

2026-05-28|paper|arxiv

Proposes a method to tune how conservatively an AI system behaves under uncertainty, enabling safer scalable oversight without sacrificing performance.

ollama/ollama (172474 stars): Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemm (github.com)

2026-05-28|tool|github

Ollama enables local download, quantization management, and inference serving of large language models including Qwen, DeepSeek, and Gemma via a CLI and API.

GEM: Generative Supervision Helps Embodied Intelligence (huggingface.co)

2026-05-28|model|huggingface

Incorporates generative model supervision signals to improve embodied agent learning, enabling better scene understanding and action planning in physical environments.

Advancing Creative Physical Intelligence in Large Multimodal Models (huggingface.co)

2026-05-28|model|huggingface

Extends large multimodal models with creative physical reasoning capabilities, enabling generation and understanding of physically plausible, imaginative real-world scenarios.

Preference-Shaped Expected Hypervolume and R2 Improvement: Exact Computation and Monotonicity (arxiv.org)

2026-05-28|paper|arxiv

Exact closed-form formulas are derived for preference-weighted expected hypervolume improvement and R2 improvement scalarizations, and their monotonicity properties in multi-objective Bayesian optimization are proven.

SwarmHarness: Skill-Based Task Routing via Decentralized Incentive-Aligned AI Agent Networks (arxiv.org)

2026-05-28|paper|arxiv

A framework routes tasks to specialized AI agents in a decentralized network using incentive-aligned mechanisms to match skills to appropriate subtasks.

Rethinking Memory as Continuously Evolving Connectivity (arxiv.org)

2026-05-28|paper|arxiv

Memory in neural systems is reframed as dynamically evolving connectivity patterns rather than static storage, enabling continuous adaptation of stored associations.

The Abstraction Gap in Vision-Language Causal Reasoning (arxiv.org)

2026-05-28|paper|arxiv

A study identifies and analyzes the performance gap between vision-language models and humans on causal reasoning tasks requiring abstract rather than perceptual understanding.

AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning (arxiv.org)

2026-05-28|paper|arxiv

Extracts and aggregates fine-grained visual attributes from CLIP features to prevent catastrophic forgetting when learning new classes incrementally.

VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading (arxiv.org)

2026-05-28|paper|arxiv

Demonstrates that adding visual modality to LLMs does not consistently improve alignment with human reading behavior measured during natural text comprehension.

langchain-ai/langchain (137830 stars): The agent engineering platform.(github.com)

2026-05-28|tool|github

LangChain provides a framework for composing LLMs, tools, and memory into chains and agents for building AI-powered applications.

open-webui/open-webui (138948 stars): User-friendly AI Interface (Supports Ollama, OpenAI API, ...)(github.com)

2026-05-28|tool|github

Open WebUI delivers a self-hosted browser interface for interacting with local and remote LLMs including Ollama and OpenAI-compatible APIs.

Significant-Gravitas/AutoGPT (184594 stars): AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our (github.com)

2026-05-28|tool|github

AutoGPT provides an open-source autonomous agent platform that chains GPT model calls with tool use to complete long-horizon tasks with minimal human input.

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems (huggingface.co)

2026-05-28|model|huggingface

Provides a coordination and policy substrate that manages communication, task allocation, and decision-making protocols across multiple collaborating AI agents.

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations (huggingface.co)

2026-05-28|model|huggingface

Presents a system that automatically discovers and iteratively refines reusable conversational skills to improve emotional support dialogue agents.

Building self-improving tax agents with Codex (openai.com)

2026-05-28|news|blog/OpenAI Blog

Codex-powered tax agents iteratively improve their own code and reasoning to handle increasingly complex tax filing and computation tasks autonomously.

← Prev21 / 172Next →