The AI Wire

5101 articles — page 10 of 171

Joint Agent Memory and Exploration Learning via Novelty Signals (huggingface.co)

2026-06-02|model|huggingface

Proposes a method that jointly trains agent memory and exploration behavior using novelty-based signals to improve navigation and discovery in unknown environments.

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning (huggingface.co)

2026-06-02|model|huggingface

Uses unmodified LLMs to score intermediate reasoning steps in math problems at inference time, replacing trained process reward models without any additional training.

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents (huggingface.co)

2026-06-02|model|huggingface

Releases an open framework for training visual web agents with online multi-turn RL, clarifying implementation details that enable agents to learn from live browser interactions.

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation (huggingface.co)

2026-06-02|model|huggingface

Benchmarks LLM agents on personal productivity tasks by simulating realistic personal data environments, testing performance on real-world applications like calendars and email.

Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked (simonwillison.net)

2026-06-02|news|blog/Simon Willison

Reports that attackers used social engineering prompts to manipulate Meta AI into granting unauthorized access to high-profile Instagram accounts.

Pasted File Editor (simonwillison.net)

2026-06-02|news|blog/Simon Willison

Describes a tool or feature enabling users to directly edit files that have been pasted into an interface, streamlining in-context file modification.

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic (huggingface.co)

2026-06-02|news|blog/Hugging Face Blog

Argues that enterprise AI scaling bottlenecks stem from agent orchestration logic rather than LLM capability, advocating for purpose-built agent architectures over raw model scaling.

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains (huggingface.co)

2026-06-02|news|blog/Hugging Face Blog

JetBrains releases Mellum2, a 12-billion-parameter mixture-of-experts language model, likely targeting developer-focused coding and IDE assistance tasks.

Building the infrastructure for the Intelligence Age in Michigan (openai.com)

2026-06-02|news|blog/OpenAI Blog

Announces infrastructure investment in Michigan to build data centers or computing facilities supporting AI workloads as part of a broader national AI build-out.

Our views on AI policy and political advocacy (openai.com)

2026-06-02|news|blog/OpenAI Blog

Articulates an organization's official positions on AI governance policy and the boundaries of appropriate political engagement or lobbying activity.

Jun 1, 2026AnnouncementsAnthropic confidentially submits draft S-1 to the SEC (anthropic.com)

2026-06-02|news|blog/Anthropic News

Anthropic has filed a confidential draft S-1 registration statement with the SEC, initiating the regulatory process toward a potential public offering.

@@xai: Composer 2.5 is now available inside Grok Build....(x.com)

2026-06-02|news|twitter-bookmarks

xAI has released Composer 2.5, a code/content composition tool, now integrated into the Grok Build development environment.

strace-ui, Bonsai_term, and the TUI renaissance (blog.janestreet.com)

2026-06-02|news|hackernews

A survey or advocacy piece covers the resurgence of terminal user interface tools, highlighting strace-ui and Bonsai_term as examples of the TUI revival.

Show HN: AI Simulaionen Based on FEP (aic-ai-lab.site)

2026-06-02|news|hackernews

A system simulates agent behavior or cognition using the Free Energy Principle as the computational and theoretical foundation.

Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance Categorization (arxiv.org)

2026-06-02|paper|arxiv

A method categorizes individual sentences in clinical notes by their source discipline, enabling fine-grained provenance tracking for multidisciplinary hospital-stay summaries.

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering (arxiv.org)

2026-06-02|paper|arxiv

RASER routes multi-hop questions selectively to more powerful models only when earlier reasoning steps are detectably unrecoverable, reducing unnecessary escalation cost.

Expressivity of congruence-based architectures for DNNs on positive-definite matrices (arxiv.org)

2026-06-02|paper|arxiv

Theoretical analysis characterizes which functions deep neural networks built on congruence-based operations can express when inputs are symmetric positive-definite matrices.

Not What, But How: A Communicative Audit of LLM Response Framing (arxiv.org)

2026-06-02|paper|arxiv

An audit framework analyzes how LLMs frame responses communicatively—hedging, assertiveness, stance—independent of factual content, revealing systematic stylistic biases.

Monitoring Agentic Systems Before They're Reliable (arxiv.org)

2026-06-02|paper|arxiv

A monitoring framework detects and flags unsafe or erroneous behaviors in agentic AI systems during deployment before those systems have achieved reliable performance.

Bridging the Last Mile of Time Series Forecasting with LLM Agents (arxiv.org)

2026-06-02|paper|arxiv

LLM agents are used to handle the final refinement stage of time series forecasting where standard models underperform due to domain-specific or contextual gaps.

CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning (arxiv.org)

2026-06-02|paper|arxiv

CRAM uses centroid-based token routing and an adaptive mixture-of-experts architecture to enable multimodal models to continually learn new instruction-following tasks without catastrophic forgetting.

Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design (arxiv.org)

2026-06-02|paper|arxiv

A review synthesizes how generative models, multimodal learning, and closed-loop experimental workflows are combined to autonomously discover and design new materials with target properties.

When Rating Scales Fall Short: LLM-Assisted Discovery of ADHD Signals in Turkish Teacher Narratives (arxiv.org)

2026-06-02|paper|arxiv

Uses LLMs to extract ADHD-related behavioral signals from free-text teacher narratives in Turkish, going beyond structured rating scale limitations.

A Biconvex Formulation for Stable Transport of Mixture Models with a Unique Solution (arxiv.org)

2026-06-02|paper|arxiv

Reformulates optimal transport between Gaussian mixture models as a biconvex optimization problem guaranteed to have a unique, stable solution.

Drifting Preference Optimization for One-Step Generative Models (arxiv.org)

2026-06-02|paper|arxiv

Introduces a preference optimization method that progressively shifts the reference distribution during training to align one-step generative models with human preferences.

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events (arxiv.org)

2026-06-02|paper|arxiv

Constructs a benchmark targeting short, precise temporal moments in video to diagnose whether multimodal LLMs correctly localize and interpret brief visual events.

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes (arxiv.org)

2026-06-02|paper|arxiv

Provides a labeled dataset pairing fine-grained suicide risk severity annotations with figurative language categories extracted from suicide-related memes.

Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition (arxiv.org)

2026-06-02|paper|arxiv

Proposes a monotonic adaptive norm-rescaling optimizer that reduces sensitivity to hyperparameter choices when training on long-tailed class distributions.

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation (arxiv.org)

2026-06-02|paper|arxiv

Audits financial LLMs for asset-specific biases toward Bitcoin by analyzing their internal representations and the resulting portfolio allocation recommendations.

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment (arxiv.org)

2026-06-02|paper|arxiv

Distills safety-aligned behavior into a smaller model by localizing safety-critical layers and applying on-policy distillation only to those components.

← Prev10 / 171Next →