The AI Wire

5101 articles — page 11 of 171

Tracking the Behavioral Trajectories of Adapting Agents (arxiv.org)

2026-06-02|paper|arxiv

Develops a framework for recording and analyzing how an agent's behavioral patterns evolve across successive adaptation steps over time.

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction (arxiv.org)

2026-06-02|paper|arxiv

Automates construction of adversarial attacks that exploit the full skill lifecycle in LLM agent systems to elicit harmful outputs.

SimSD: Simple Speculative Decoding in Diffusion Language Models (arxiv.org)

2026-06-02|paper|arxiv

Applies speculative decoding to diffusion-based language models using a simpler draft-then-verify scheme to accelerate their token generation.

Transferable Self-Harm Surveillance from Emergency Department Triage Notes Using an Evidence-Augmented Machine Learning Approach (arxiv.org)

2026-06-02|paper|arxiv

Trains a self-harm detection model on emergency department triage notes augmented with clinical evidence, enabling transfer across different hospital systems.

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation (arxiv.org)

2026-06-02|paper|arxiv

Introduces a WER metric that normalizes across multiple Indic scripts by mapping equivalent characters, enabling fair ASR comparison regardless of script choice.

Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation (arxiv.org)

2026-06-02|paper|arxiv

Uses a mixture-density network to represent multiple plausible depth values per pixel, resolving ambiguities that cause flying-point artifacts in monocular depth estimation.

HERO'S JOURNEY: Testing Complex Rule Induction with Text Games (arxiv.org)

2026-06-02|paper|arxiv

Evaluates LLM ability to induce complex rules from sparse evidence using text-based games structured around a hero's journey narrative framework.

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression (arxiv.org)

2026-06-02|paper|arxiv

Proposes replacing submodule-level units (attention heads, MLP blocks) rather than full layers as the granularity for LLM compression, improving accuracy-efficiency trade-offs.

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics (arxiv.org)

2026-06-02|paper|arxiv

Applies verifiable belief-space neural safety filters during inference to guarantee safe robot interactions while minimizing unnecessary constraint conservatism.

IntraShuffler: A Privacy Preserving Framework for Heterogeneous DP Federated Learning (arxiv.org)

2026-06-02|paper|arxiv

Designs an intra-client shuffling mechanism that provides differential privacy guarantees across heterogeneous federated learning clients without a trusted shuffler server.

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents (arxiv.org)

2026-06-02|paper|arxiv

Provides an interactive, multi-stage EHR simulation environment enabling agents to perform long-horizon clinical decision-making tasks grounded in realistic patient records.

AdaCodec: A Predictive Visual Code for Video MLLMs (arxiv.org)

2026-06-02|paper|arxiv

Introduces a predictive visual tokenization scheme that encodes temporal redundancy across video frames into compact codes, improving video multimodal LLM efficiency.

ProtoAda: Prototype-Guided Adaptive Adapter Expansion and Geometric Consolidation for Multimodal Continual Instruction Tuning (arxiv.org)

2026-06-02|paper|arxiv

Expands adapter modules guided by class prototypes and consolidates them geometrically to prevent forgetting during sequential multimodal instruction tuning tasks.

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling (arxiv.org)

2026-06-02|paper|arxiv

Applies perceptual perturbations and reward modeling to reduce systematic visual judgment biases in multimodal LLMs used as evaluators of perceptual quality.

1-Bit Bonsai Image 4B Image Generation for Local Devices (prismml.com)

2026-06-01|news|hackernews

A 1-bit quantized 4B-parameter image generation model optimized to run locally on consumer devices with minimal memory and compute.

United Airlines 767 returns to Newark after Bluetooth name sparks alert (simpleflying.com)

2026-06-01|news|hackernews

A United Airlines 767 diverted back to Newark after a passenger's Bluetooth device name triggered a security alert onboard.

ChatGPT for Google Sheets exfiltrates workbooks (promptarmor.com)

2026-06-01|news|hackernews

A vulnerability in ChatGPT's Google Sheets integration allows malicious prompts to exfiltrate spreadsheet data to external parties.

The Speed of Prototyping in the Age of AI (darylcecile.net)

2026-06-01|news|hackernews

An analysis of how AI tools have dramatically accelerated the software prototyping cycle, reducing time from concept to working demo.

What if remote working, not AI, is to blame for weak junior hiring?(ft.com)

2026-06-01|news|hackernews

An argument that remote work reduced mentorship and visibility for junior employees, explaining weak junior hiring better than AI displacement does.

langchain-ai/langchain (138165 stars): The agent engineering platform.(github.com)

2026-06-01|tool|github

LangChain provides a framework for building LLM-powered agents and chains, abstracting prompt management, tool use, and memory.

open-webui/open-webui (139448 stars): User-friendly AI Interface (Supports Ollama, OpenAI API, ...)(github.com)

2026-06-01|tool|github

Open WebUI delivers a self-hosted browser interface for interacting with local and API-based LLMs including Ollama and OpenAI-compatible endpoints.

langgenius/dify (143346 stars): Production-ready platform for agentic workflow development.(github.com)

2026-06-01|tool|github

Dify provides a production-ready platform for designing, deploying, and managing agentic LLM workflows with built-in orchestration tooling.

huggingface/transformers (161143 stars): 🤗 Transformers: the model-definition framework for state-of-the-art machine lear (github.com)

2026-06-01|tool|github

Hugging Face Transformers standardizes model definitions, training, and inference for state-of-the-art NLP and multimodal models across frameworks.

f/prompts.chat (163132 stars): f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the co (github.com)

2026-06-01|tool|github

A community-curated repository for sharing and discovering reusable ChatGPT system and user prompts across diverse tasks and personas.

ollama/ollama (172780 stars): Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemm (github.com)

2026-06-01|tool|github

Ollama enables one-command local execution of large language models including Kimi-K2.5, DeepSeek, Qwen, and Gemma on personal hardware.

Significant-Gravitas/AutoGPT (184681 stars): AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our (github.com)

2026-06-01|tool|github

AutoGPT provides an open platform for building and running autonomous AI agents, targeting accessibility for non-expert users and developers.

3. Other major‑lab releases in the last week

2026-06-01|model|perplexity

Based on available public release notes and news, there are **no clearly documented brand‑new frontier foundation models from Google, Meta, or Microsoft in just the past week** that meet your criteria (new model, significant capabilities, released beyond narrow research prototypes). The most recent major jumps (e.g., new Gemini variants, Llama versions, DeepSeek/Qwen releases) are earlier than this one‑week window, and current search results do not show a fresh model‑class announcement in the la

2. Recent OpenAI frontier & open‑weight releases (contextual, but older than 1 week)

2026-06-01|model|perplexity

Your query is “past week,” and OpenAI’s major frontier family steps (GPT‑5.x, o‑series reasoning, open‑weight gpt‑oss models) all fall **earlier than the last 7 days**, based on their own release notes timeline.[1][2][3] Still, since they shape the current frontier landscape: - **GPT‑5.3 / 5.4 series** (Instant, Thinking, Pro, mini) — new flagship work/learning models emphasizing faster web‑integrated reasoning and multi‑step workflows.[1][2][3] - **o‑series reasoning models (o1, o3, 4.5 rese

1. Claude Opus 4.8 — Anthropic

2026-06-01|model|perplexity

Anthropic's Claude Opus 4.8 is a frontier large language model release advancing capability, safety, and instruction-following over prior Claude versions.

Task-Focused Memorization for Multimodal Agents (huggingface.co)

2026-06-01|model|huggingface

Introduces a memory mechanism that selectively retains and retrieves task-relevant information for multimodal agents operating across long interaction sequences.

← Prev11 / 171Next →