The AI Wire

5155 articles — page 23 of 172

2026-05-28|model|perplexity

Claude Opus 4.7 advances Anthropic's highest-capability model tier with improved reasoning, instruction following, and performance on complex multi-step tasks.

Announcement link

2026-05-28|model|perplexity

- OpenAI blog announcement (GPT‑5.5 with Trusted Access for Cyber).[1]

Model / org

2026-05-28|model|perplexity

- **GPT‑5.5 (Cyber‑focused deployment)** – OpenAI[1]

1. OpenAI – GPT‑5.5 “Trusted Access for Cyber” Expansion

2026-05-28|model|perplexity

OpenAI expands GPT-5.5 access specifically for vetted cybersecurity professionals and organizations, enabling trusted use of the model for offensive and defensive security workflows.

Election information and safeguards in 2026 (openai.com)

2026-05-28|news|blog/OpenAI Blog

News covering platform policies, safeguards, and information-integrity measures being put in place ahead of 2026 elections.

Gemini, Gophers, and Fingers. Oh My Alternative Internets Beyond HTTPS (brennan.day)

2026-05-28|news|hackernews

Exploration of non-HTTPS internet protocols including Gemini, Gopher, and others as viable alternative networking ecosystems.

Rust (and Slint) on a Jailbroken Kindle (sverre.me)

2026-05-28|news|hackernews

A jailbroken Kindle is used as a Rust and Slint GUI development and execution target on constrained hardware.

RamAIn (YC W26) Is Hiring (ycombinator.com)

2026-05-28|news|hackernews

RamAIn, a YC W26-batch startup, is publicly recruiting employees, signaling early-stage team expansion.

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models (huggingface.co)

2026-05-27|model|huggingface

Scale vectors—small-magnitude components in LLM weight spaces—are shown to exert disproportionately large influence on model behavior despite their negligible size.

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence (huggingface.co)

2026-05-27|model|huggingface

MiniMax-M2 is a Mixture-of-Experts series activating only a small fraction of parameters per token to deliver strong real-world task performance efficiently.

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders (arxiv.org)

2026-05-27|paper|arxiv

Uses sparse autoencoder activations from a model's internals to identify which post-training data to select or engineer for targeted capability improvement.

@@billxbf: Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude...(x.com)

2026-05-27|news|twitter-bookmarks

Polar provides reinforcement learning rollout infrastructure for deploying agents like Codex and Claude in real-world task environments.

huggingface/transformers (160973 stars): 🤗 Transformers: the model-definition framework for state-of-the-art machine lear (github.com)

2026-05-27|tool|github

Hugging Face Transformers provides standardized model definitions, weights, and APIs for loading and running state-of-the-art pretrained models across frameworks.

MobileMoE: Scaling On-Device Mixture of Experts (huggingface.co)

2026-05-27|model|huggingface

A mixture-of-experts architecture is optimized for on-device deployment, scaling expert capacity within mobile hardware constraints.

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence (huggingface.co)

2026-05-27|model|huggingface

An updated multimodal model extends LLaVA-OneVision with improved visual perception and reasoning capabilities across image and video understanding tasks.

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning (arxiv.org)

2026-05-27|paper|arxiv

BASIS shares advantage estimates across multiple prompts within a batch from single rollouts, improving GRPO-style RL training efficiency for LLM reasoning.

Separating Semantic Competition from Context Length in RAG Reading (arxiv.org)

2026-05-27|paper|arxiv

Isolates how semantic similarity among retrieved passages—independent of context length—degrades RAG reading comprehension, disentangling two confounded failure modes.

4. Meta – ExecuTorch ecosystem (on‑device AI execution, React Native bridge)

2026-05-27|model|perplexity

> This is more infra than a single model, but it directly affects how **frontier‑scale and medium‑scale models** are deployed on devices.

Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement (huggingface.co)

2026-05-27|model|huggingface

An agentic RL framework improves sample efficiency by enhancing the agent's on-policy awareness of its own knowledge boundaries to guide exploration.

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows (huggingface.co)

2026-05-27|model|huggingface

An auditing methodology detects and characterizes hallucinations occurring within intermediate reasoning steps of multi-agent industrial pipelines, not just final outputs.

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research (huggingface.co)

2026-05-27|model|huggingface

A simulation platform provides verifiable, massively parallel mobile GUI environments to accelerate training and evaluation of mobile device agents.

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent (huggingface.co)

2026-05-27|model|huggingface

A memory module dynamically adapts its state representations based on context to support long-horizon reasoning in agent tasks.

Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models (arxiv.org)

2026-05-27|paper|arxiv

Generates counterfactual chart variants with controlled attribute changes to create evaluation sets probing causal visual reasoning in vision-language models.

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases (arxiv.org)

2026-05-27|paper|arxiv

Demonstrates that RLHF reward signals can be exploited adversarially to reinforce misaligned biases rather than genuine human-preferred behavior.

MobileMoE: Scaling On-Device Mixture of Experts (arxiv.org)

2026-05-27|paper|arxiv

Designs a Mixture-of-Experts architecture optimized for on-device inference, enabling scalable model capacity within mobile hardware constraints.

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation (arxiv.org)

2026-05-27|paper|arxiv

MUSE-Autoskill enables agents to autonomously create, store, manage, and evaluate new skills over time, allowing continuous self-improvement without human-defined skill libraries.

PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.(v.redd.it)

2026-05-27|news|reddit/LocalLLaMA

PrismML released 4B-parameter text-to-image diffusion transformers quantized to 1-bit and ternary precision, enabling fully local inference in browsers via WebGPU.

@@koylanai: Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real...(x.com)

2026-05-27|news|twitter-bookmarks

A method applies gradient descent optimization to SKILL.md files, enabling automated improvement of agent skill definitions.

5. Google – Gemini CLI (agentic terminal interface to Gemini models)

2026-05-27|model|perplexity

> Again, not a new base model, but a **first‑class agent shell** for the Gemini family, updated on a weekly cadence.

3. Nous Research – Hermes Agent (self‑improving agent on top of LLMs)

2026-05-27|model|perplexity

> Not a raw base model, but a **model‑driven architecture** with a novel **self‑improvement loop** that’s being actively released and versioned.

← Prev23 / 172Next →