The AI Wire

5155 articles — page 18 of 172

CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation (huggingface.co)

2026-05-29|model|huggingface

CollectionLoRA distills 50 distinct visual effects into a single LoRA adapter using multi-teacher on-policy distillation, avoiding the need for separate adapters per effect.

Colored Noise Diffusion Sampling (huggingface.co)

2026-05-29|model|huggingface

Introduces a diffusion sampling method using colored (correlated) noise instead of white noise to improve sample quality or diversity in generative models.

OpenAI’s Frontier Governance Framework (openai.com)

2026-05-29|news|blog/OpenAI Blog

OpenAI released a policy framework defining governance principles, safety criteria, and deployment boundaries for its frontier AI models.

Resolution Diagnostics for Paired LLM Evaluation (arxiv.org)

2026-05-29|paper|arxiv

Provides diagnostic tools to detect when paired LLM evaluation frameworks lack sufficient resolution to reliably distinguish model quality differences.

Gram: Assessing sabotage propensities via automated alignment auditing (arxiv.org)

2026-05-29|paper|arxiv

Automatically audits AI systems for sabotage-oriented behaviors by probing for misaligned propensities using structured behavioral assessment pipelines.

@@claudeai: Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own...(x.com)

2026-05-29|news|twitter-bookmarks

Anthropic released Claude Opus 4.8, incrementally improving on Opus 4.7 with better reasoning judgment and enhanced honesty in self-reporting limitations.

langgenius/dify (143044 stars): Production-ready platform for agentic workflow development.(github.com)

2026-05-29|tool|github

Dify offers a production-grade platform for designing, deploying, and managing agentic LLM workflows with built-in orchestration tooling.

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists (huggingface.co)

2026-05-29|model|huggingface

CausaLab is a scalable interactive environment where AI agents can propose interventions and run experiments to autonomously discover causal structure in simulated systems.

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning (huggingface.co)

2026-05-29|model|huggingface

Trains RL agents to simultaneously internalize reusable skills and deploy them, improving generalization to out-of-distribution tasks without relearning from scratch.

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection (arxiv.org)

2026-05-29|paper|arxiv

MIRA anchors data selection during mid-training using rubric-based scoring that is aware of data source provenance to improve training data quality.

Demystifying Data Organization for Enhanced LLM Training (arxiv.org)

2026-05-29|paper|arxiv

Systematically analyzes how training data organization—ordering, grouping, and formatting—affects LLM training efficiency and final model quality.

GPIC: A Giant Permissive Image Corpus for Visual Generation (arxiv.org)

2026-05-29|paper|arxiv

GPIC releases a large-scale permissively licensed image corpus specifically curated to train and evaluate visual generation models.

Various LLM Smells (shvbsle.in)

2026-05-29|news|hackernews

Catalogs recurring anti-patterns and problematic behaviors observed across large language models, analogous to code smells in software engineering.

ollama/ollama (172558 stars): Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemm (github.com)

2026-05-29|tool|github

Ollama enables local execution of models including Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, and Gemma with minimal setup.

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction (huggingface.co)

2026-05-29|model|huggingface

WorldMemArena benchmarks multimodal agent memory by measuring how well agents retain and leverage information from prior action-environment interactions over extended horizons.

PhoneWorld: Scaling Phone-Use Agent Environments (huggingface.co)

2026-05-29|model|huggingface

PhoneWorld scales mobile-device-use environments for training and evaluating agents that navigate and interact with real smartphone interfaces across diverse apps and tasks.

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection (huggingface.co)

2026-05-29|model|huggingface

Backdoor attacks on LoRA adapters operate at the token level, and the work characterizes their generalization behavior while proposing detection methods based on behavioral signatures.

Is Position Bias in Dense Retrievers Built In-or Learned from Data?(huggingface.co)

2026-05-29|model|huggingface

An analysis determines whether position bias in dense retrieval models originates from architectural inductive biases or is acquired through training data distribution and supervision signals.

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation (huggingface.co)

2026-05-29|model|huggingface

Builds a multi-agent system where specialized agents collaboratively produce interleaved text-and-image research reports with verifiable, grounded factual claims.

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas (huggingface.co)

2026-05-29|model|huggingface

Applies automated research methods to discover cooperative agent pipeline strategies that resolve sequential social dilemmas requiring coordination between multiple agents.

Claude Opus 4.8: "a modest but tangible improvement"(simonwillison.net)

2026-05-29|news|blog/Simon Willison

Anthropic releases Claude Opus 4.8, described as delivering incremental performance gains over its predecessor.

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings (arxiv.org)

2026-05-29|paper|arxiv

Introduces a dataset of clinical case texts paired with structured FHIR representations to benchmark LLM diagnostic reasoning in realistic EHR formats.

Archon: A Unified Multimodal Model for Holistic Digital Human Generation (arxiv.org)

2026-05-29|paper|arxiv

Presents a single multimodal model that jointly generates appearance, motion, voice, and other attributes of realistic digital humans in a unified framework.

Show HN: Ktx – Open-source executable context layer for data agents (github.com)

2026-05-29|news|hackernews

Open-source tool providing a structured executable context layer that standardizes how data agents access, interpret, and act on contextual information.

The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin (minimaxir.com)

2026-05-29|news|hackernews

An unidentified model called Hy3 is achieving top rankings on OpenRouter's usage or performance charts by a significant margin over known models.

langchain-ai/langchain (137926 stars): The agent engineering platform.(github.com)

2026-05-29|tool|github

LangChain provides a framework for composing LLM-powered agents and chains, enabling developers to build and orchestrate multi-step AI workflows.

open-webui/open-webui (139109 stars): User-friendly AI Interface (Supports Ollama, OpenAI API, ...)(github.com)

2026-05-29|tool|github

Open WebUI delivers a self-hosted browser interface for interacting with local models via Ollama and remote models via the OpenAI API.

Significant-Gravitas/AutoGPT (184630 stars): AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our (github.com)

2026-05-29|tool|github

AutoGPT provides an open-source platform enabling users to deploy and build autonomous AI agents that chain LLM calls to complete multi-step tasks without continuous human input.

Anthropic's run-rate revenue hits $47 billion (simonwillison.net)

2026-05-29|news|blog/Simon Willison

Reports that Anthropic's annualized revenue run-rate has reached $4.7 billion, reflecting rapid commercial growth.

Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets (arxiv.org)

2026-05-29|paper|arxiv

Encodes numeric tabular datasets as statistical feature vectors enabling similarity search, retrieval, and interpretable alignment between datasets without deep learning.

← Prev18 / 172Next →