The AI Wire

High Signal (4-5)clear

3149 articles — page 10 of 105

6. Qwen3.5 Family – Gated Delta Networks & advanced decoding (open‑source)

2026-05-30|model|perplexity

Qwen3.5 introduces Gated Delta Network architectures replacing standard attention and advanced decoding strategies across an open-source model family for improved efficiency and performance.

5. OpenAI – gpt‑oss‑120b & gpt‑oss‑20b (open‑weight reasoning models)

2026-05-30|model|perplexity

OpenAI releases two open-weight reasoning-capable models at 120B and 20B parameter scales, making competitive reasoning model weights publicly accessible.

4. OpenAI – GPT‑5.3‑Codex and GPT‑5.1‑Codex‑Max (frontier coding/agentic models)

2026-05-30|model|perplexity

OpenAI releases frontier-grade coding and agentic models in two tiers—GPT-5.3-Codex and GPT-5.1-Codex-Max—optimized for software generation and autonomous task execution.

3. Anthropic – Claude Opus 4.8 (frontier Claude upgrade)

2026-05-30|model|perplexity

Anthropic upgrades the Claude Opus line to version 4.8, advancing frontier-level capability, likely in reasoning, instruction following, or safety alignment over prior Opus releases.

2. OpenAI – GPT‑5.2 (enterprise‑focused frontier series)

2026-05-30|model|perplexity

OpenAI targets enterprise deployment with GPT-5.2, a frontier model series tuned for reliability, compliance, and performance in business-critical applications.

1. OpenAI – GPT‑5.5 (frontier model, cyber‑focused rollout)

2026-05-30|model|perplexity

OpenAI rolls out GPT-5.5 as a frontier model with a cyber-focused initial deployment, targeting cybersecurity-related tasks or threat analysis use cases.

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers (huggingface.co)

2026-05-30|model|huggingface

PRISM evaluates LLMs acting as academic peer reviewers across multiple quality dimensions, measuring review accuracy, consistency, and alignment with human expert judgments.

Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation (huggingface.co)

2026-05-30|model|huggingface

Reformulating uniform diffusion models with a leave-one-out denoiser and absorbing-state perspective yields a theoretically cleaner, more stable training and inference framework for discrete generative models.

EarlyTom: Early Token Compression Completes Fast Video Understanding (huggingface.co)

2026-05-30|model|huggingface

EarlyTom compresses video token representations at early transformer layers, dramatically reducing computation while preserving sufficient temporal information for accurate video understanding.

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval (huggingface.co)

2026-05-30|model|huggingface

CoHyDE jointly and iteratively trains an LLM query rewriter alongside a dense encoder so both components mutually improve retrieval of the correct external tools for a given query.

Xetrieval: Mechanistically Explaining Dense Retrieval (huggingface.co)

2026-05-30|model|huggingface

Xetrieval applies mechanistic interpretability methods to dense retrieval models, identifying which internal circuits and representations drive document-query similarity scoring.

REPOT: Recoverable Program-of-Thought via Checkpoint Repair (huggingface.co)

2026-05-30|model|huggingface

Introduces checkpoint-based repair mechanisms that recover failed Program-of-Thought reasoning chains mid-execution rather than restarting from scratch.

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation (huggingface.co)

2026-05-30|model|huggingface

Builds consistent 3D Gaussian head avatars from single-view inputs by enforcing multi-view consistency without requiring multi-view generative models.

Reducing Political Manipulation with Consistency Training (huggingface.co)

2026-05-30|model|huggingface

Applies consistency training to penalize contradictory political outputs, reducing a model's susceptibility to manipulation through framing or loaded prompts.

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection (huggingface.co)

2026-05-30|model|huggingface

Deploys a compact vision-language model for time-series anomaly detection, achieving trustworthy reasoning under tight computational constraints.

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation (huggingface.co)

2026-05-30|model|huggingface

Fuses visual, language, and dynamics modalities with tri-modal guidance to learn robot perception representations robust to dynamic scene changes.

Convex Low-resource Accent-Robust Language Detection in Speech Recognition (huggingface.co)

2026-05-30|model|huggingface

Formulates accent-robust language identification as a convex optimization problem to improve speech recognition accuracy under low-resource accent conditions.

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation (huggingface.co)

2026-05-30|model|huggingface

Distills agent skills online into a multimodal AI agent framework, reducing computation while preserving task performance across diverse modalities.

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM (huggingface.co)

2026-05-30|model|huggingface

Evicts KV cache entries based on attention confidence scores and stores retained entries at mixed precision, reducing memory use in long-context LLM inference.

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models (huggingface.co)

2026-05-30|model|huggingface

Probes vision-language models to diagnose why distant objects are systematically represented as higher in spatial encoding, revealing a geometric bias.

Reflective Prompt Tuning through Language Model Function-Calling (huggingface.co)

2026-05-30|model|huggingface

Uses language model function-calling as a reflective feedback loop to iteratively refine soft prompt tuning based on self-generated evaluative signals.

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler (huggingface.co)

2026-05-30|news|blog/Hugging Face Blog

Provides a beginner-oriented tutorial on using torch.profiler to identify performance bottlenecks in PyTorch training and inference workflows.

A shared playbook for trustworthy third party evaluations (openai.com)

2026-05-30|news|blog/OpenAI Blog

Proposes standardized guidelines for conducting trustworthy third-party AI evaluations, covering methodology, transparency, and conflict-of-interest management.

Strengthening societal resilience with Rosalind Biodefense (openai.com)

2026-05-30|news|blog/OpenAI Blog

Rosalind Biodefense applies AI to biological threat detection and response, strengthening public health infrastructure against pandemic and bioterrorism risks.

How Braintrust turns customer requests into code with Codex (openai.com)

2026-05-30|news|blog/OpenAI Blog

Braintrust uses Codex to automatically translate natural-language customer feature requests directly into executable code, accelerating software delivery.

Boston Children’s uses AI to unlock new diagnoses (openai.com)

2026-05-30|news|blog/OpenAI Blog

Boston Children's Hospital deploys AI to identify previously missed or difficult-to-reach diagnoses in pediatric patients from clinical data.

Claude Opus 4.8 (anthropic.com)

2026-05-29|news|hackernews

Anthropic released Claude Opus 4.8, a new iteration of their flagship model in the Opus line with updated capabilities.

Anthropic raises $65B in Series H funding at $965B post-money valuation (anthropic.com)

2026-05-29|news|hackernews

Anthropic secured $65 billion in Series H funding, pushing its post-money valuation to $965 billion.

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue (llmgame.scalex.dev)

2026-05-29|news|hackernews

A 60-second interactive game simulates the repetitive approval prompts AI agents generate, highlighting user fatigue from constant permission requests.

Various LLM Smells (shvbsle.in)

2026-05-29|news|hackernews

Catalogs recurring anti-patterns and problematic behaviors observed across large language models, analogous to code smells in software engineering.

← Prev10 / 105Next →