The AI Wire

5155 articles — page 17 of 172

6. Qwen3.5 Family – Gated Delta Networks & advanced decoding (open‑source)

2026-05-30|model|perplexity

Qwen3.5 introduces Gated Delta Network architectures replacing standard attention and advanced decoding strategies across an open-source model family for improved efficiency and performance.

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers (huggingface.co)

2026-05-30|model|huggingface

PRISM evaluates LLMs acting as academic peer reviewers across multiple quality dimensions, measuring review accuracy, consistency, and alignment with human expert judgments.

How Braintrust turns customer requests into code with Codex (openai.com)

2026-05-30|news|blog/OpenAI Blog

Braintrust uses Codex to automatically translate natural-language customer feature requests directly into executable code, accelerating software delivery.

Boston Children’s uses AI to unlock new diagnoses (openai.com)

2026-05-30|news|blog/OpenAI Blog

Boston Children's Hospital deploys AI to identify previously missed or difficult-to-reach diagnoses in pediatric patients from clinical data.

A shared playbook for trustworthy third party evaluations (openai.com)

2026-05-30|news|blog/OpenAI Blog

Proposes standardized guidelines for conducting trustworthy third-party AI evaluations, covering methodology, transparency, and conflict-of-interest management.

Hugging Face / announcement

2026-05-30|model|perplexity

- vLLM release notes mention Qwen3.5 support as a major new architecture.[2] - The underlying Qwen3.5 models are typically published on Hugging Face under the Qwen org; vLLM’s notes are a reliable pointer to the family’s capabilities.[2]

5. OpenAI – gpt‑oss‑120b & gpt‑oss‑20b (open‑weight reasoning models)

2026-05-30|model|perplexity

OpenAI releases two open-weight reasoning-capable models at 120B and 20B parameter scales, making competitive reasoning model weights publicly accessible.

f/prompts.chat (163046 stars): f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the co (github.com)

2026-05-30|tool|github

A community-curated repository for sharing, discovering, and collecting reusable prompt templates for ChatGPT and other LLM interfaces.

Strengthening societal resilience with Rosalind Biodefense (openai.com)

2026-05-30|news|blog/OpenAI Blog

Rosalind Biodefense applies AI to biological threat detection and response, strengthening public health infrastructure against pandemic and bioterrorism risks.

4. OpenAI – GPT‑5.3‑Codex and GPT‑5.1‑Codex‑Max (frontier coding/agentic models)

2026-05-30|model|perplexity

OpenAI releases frontier-grade coding and agentic models in two tiers—GPT-5.3-Codex and GPT-5.1-Codex-Max—optimized for software generation and autonomous task execution.

3. Anthropic – Claude Opus 4.8 (frontier Claude upgrade)

2026-05-30|model|perplexity

Anthropic upgrades the Claude Opus line to version 4.8, advancing frontier-level capability, likely in reasoning, instruction following, or safety alignment over prior Opus releases.

2. OpenAI – GPT‑5.2 (enterprise‑focused frontier series)

2026-05-30|model|perplexity

OpenAI targets enterprise deployment with GPT-5.2, a frontier model series tuned for reliability, compliance, and performance in business-critical applications.

1. OpenAI – GPT‑5.5 (frontier model, cyber‑focused rollout)

2026-05-30|model|perplexity

OpenAI rolls out GPT-5.5 as a frontier model with a cyber-focused initial deployment, targeting cybersecurity-related tasks or threat analysis use cases.

Anthropic raises $65B in Series H funding at $965B post-money valuation (anthropic.com)

2026-05-29|news|hackernews

Anthropic secured $65 billion in Series H funding, pushing its post-money valuation to $965 billion.

Self-Trained Verification for Training- and Test-Time Self-Improvement (arxiv.org)

2026-05-29|paper|arxiv

Trains a model to verify its own outputs, then uses those verification signals to improve both fine-tuning and inference-time reasoning.

Reasoning with Sampling: Cutting at Decision Points (arxiv.org)

2026-05-29|paper|arxiv

Improves LLM reasoning efficiency by identifying and sampling only at critical decision branch points rather than uniformly across generation steps.

When, why, and how do diffusion posterior samplers fail? A finite-sample lens (arxiv.org)

2026-05-29|paper|arxiv

Provides finite-sample theoretical analysis identifying when, why, and how diffusion-based posterior samplers break down, characterizing failure conditions precisely.

Unlocking the Working Memory of Large Language Models for Latent Reasoning (arxiv.org)

2026-05-29|paper|arxiv

Expands LLMs' effective working memory by enabling latent-space reasoning steps that are maintained across context without being decoded into tokens.

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion (arxiv.org)

2026-05-29|paper|arxiv

Reduces memory cost of autoregressive video diffusion at minute-scale lengths by compressing key-value caches using low-rank latent factorization.

huggingface/transformers (161035 stars): 🤗 Transformers: the model-definition framework for state-of-the-art machine lear (github.com)

2026-05-29|tool|github

Hugging Face Transformers provides standardized model definitions, weights, and APIs for loading and running state-of-the-art pretrained language and vision models.

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention (huggingface.co)

2026-05-29|model|huggingface

Larger models generalize better on rare tasks because greater capacity reduces inter-task interference and preserves low-frequency training signal that smaller models overwrite.

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering (huggingface.co)

2026-05-29|model|huggingface

Extends verifiable reward signals for RLHF beyond math/code by using lightweight corpus-grounded process supervision to train models on factual question answering.

In-Context Reward Adaptation for Robust Preference Modeling (arxiv.org)

2026-05-29|paper|arxiv

Adapts reward models at inference time using in-context examples of preferences, making preference modeling more robust to distribution shift without retraining.

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents (arxiv.org)

2026-05-29|paper|arxiv

Derives bounds showing multi-component LLM agent pipelines can produce locally coherent outputs that are globally compositionally inconsistent, quantifying this incoherence gap.

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching (arxiv.org)

2026-05-29|paper|arxiv

Reduces test-time LLM fine-tuning cost by reconstructing weight updates via convex combinations and caching gradients to avoid redundant recomputation.

LLMSurgeon: Diagnosing Data Mixture of Large Language Models (arxiv.org)

2026-05-29|paper|arxiv

Diagnoses and estimates the composition of training data mixtures in large language models by analyzing model weights or outputs without direct data access.

Claude Opus 4.8 (anthropic.com)

2026-05-29|news|hackernews

Anthropic released Claude Opus 4.8, a new iteration of their flagship model in the Opus line with updated capabilities.

@@ClaudeDevs: New in Claude Code (research preview): dynamic workflows....(x.com)

2026-05-29|news|twitter-bookmarks

Claude Code gained a research-preview feature called dynamic workflows, enabling adaptive, condition-driven multi-step agentic task execution.

@@_catwu: Excited to share our most powerful new Claude Code feature: dynamic workflows!...(x.com)

2026-05-29|news|twitter-bookmarks

Dynamic workflows in Claude Code allow the agent to adaptively plan and modify its execution steps at runtime based on intermediate results.

YoCausal: How Far is Video Generation from World Model? A Causality Perspective (huggingface.co)

2026-05-29|model|huggingface

A causal framework exposes gaps between current video generation models and true world models by testing whether generated videos respect cause-and-effect dependencies.

← Prev17 / 172Next →