The AI Wire

5155 articles — page 22 of 172

Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning (arxiv.org)

2026-05-28|paper|arxiv

Principled optimization algorithms are derived for generalized multi-label metrics, providing theoretically grounded solutions beyond surrogate loss approximations.

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?(arxiv.org)

2026-05-28|paper|arxiv

An investigation tests whether LLMs can accurately deploy hedging words and uncertainty markers to express calibrated confidence aligned with their internal prediction certainty.

Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval (arxiv.org)

2026-05-28|paper|arxiv

A comparative study measures whether providing semantic metadata improves agent performance on data retrieval tasks versus relying on structural or syntactic information alone.

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration (arxiv.org)

2026-05-28|paper|arxiv

Introduces a multimodal verifier that explicitly recalibrates verification confidence across modalities using structured intermediate reasoning steps.

Nothing is real anymore. We are reaching the point where crowd scenes can be entirely generated by AI.(v.redd.it)

2026-05-28|news|reddit/artificial

AI-generated crowd scenes have reached quality sufficient to fully replace real filmed extras in video production.

Investigating how prompt politeness affects LLM accuracy (2025)(arxiv.org)

2026-05-28|news|hackernews

Empirical study measuring whether polite versus rude prompt phrasing causes statistically meaningful differences in LLM answer correctness.

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation (huggingface.co)

2026-05-28|model|huggingface

Trains a proactive recommendation agent via reinforcement learning with a rectified policy gradient to correct overestimated returns and improve anticipatory item suggestion.

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations (arxiv.org)

2026-05-28|paper|arxiv

A sequence modeling architecture combines multiple mixer-style modules with shared representations to flexibly handle diverse sequential tasks.

Personal Visual Memory from Explicit and Implicit Evidence (arxiv.org)

2026-05-28|paper|arxiv

Builds a personalized visual memory system by combining explicitly stated and implicitly observed user experiences to represent individual visual histories.

Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization (arxiv.org)

2026-05-28|paper|arxiv

Uses a rollout-based world model with offline preference optimization to recommend music that matches a listener's evolving affective state.

sqlite AGENTS.md (simonwillison.net)

2026-05-28|news|blog/Simon Willison

News item about an AGENTS.md specification or convention being adopted or discussed in the context of the SQLite project.

Reachy Mini goes fully local (huggingface.co)

2026-05-28|news|blog/Hugging Face Blog

News about the Reachy Mini robot gaining fully on-device, local AI inference capabilities without relying on cloud connectivity.

How to use Codex for everyday work (openai.com)

2026-05-28|news|blog/OpenAI Blog

Practical guide covering workflows and techniques for integrating OpenAI Codex into routine daily work tasks.

Stance Detection in Prediction Markets: Addressing Imbalanced Trader Commentary via Counterfactual Augmentation and Market Context (arxiv.org)

2026-05-28|paper|arxiv

A system detects bullish/bearish stances in prediction-market trader comments by using counterfactual data augmentation and market-context features to correct severe class imbalance.

Niantic Spatial and Spexi Partner on Drone Imagery for AI (auganix.org)

2026-05-28|news|reddit/artificial

Niantic Spatial and Spexi are collaborating to supply high-resolution drone-captured imagery as training and mapping data for AI spatial computing applications.

SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More (swe-rebench.com)

2026-05-28|news|reddit/LocalLLaMA

Tracks and ranks AI coding agents including GPT-5.5, Opus 4.7, Cursor Composer 2.5, and Kimi K2.6 on software engineering tasks via the SWE-rebench leaderboard for early 2026.

Behold! Probably the most ghetto local AI server:(i.redd.it)

2026-05-28|news|reddit/LocalLLaMA

A makeshift, low-cost local AI inference server built from unconventional or repurposed consumer hardware.

I think Anthropic and OpenAI have found product-market fit (simonwillison.net)

2026-05-28|news|hackernews

Analysis argues Anthropic and OpenAI have achieved sustainable, large-scale commercial adoption with their AI products.

Warp’s big bet on building open source with GPT-5.5 (openai.com)

2026-05-28|news|blog/OpenAI Blog

News about Warp terminal's strategic commitment to building open-source tooling powered by GPT-5.5.

YouTube to automatically label AI-generated videos (blog.youtube)

2026-05-28|news|hackernews

YouTube is implementing automatic detection and labeling to disclose when video content has been AI-generated.

4. Nous Research – Hermes Agent v0.14.0 (Self‑Improving Agent Stack)

2026-05-28|model|perplexity

Nous Research's Hermes Agent v0.14.0 introduces a self-improving agent stack where the agent iteratively refines its own prompts, tools, or weights during operation.

3. Anthropic – Claude Sonnet 4.6 (1M‑token Context)

2026-05-28|model|perplexity

Claude Sonnet 4.6 extends Anthropic's mid-tier model to a one-million-token context window, enabling processing of entire codebases or book-length documents in a single pass.

f/prompts.chat (162942 stars): f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the co (github.com)

2026-05-28|tool|github

A community-curated repository for sharing and discovering reusable system and user prompts for ChatGPT and other LLM interfaces.

Quoting Kyle Ferrana (simonwillison.net)

2026-05-28|news|blog/Simon Willison

News item quoting or featuring statements from Kyle Ferrana on an AI-related topic.

Can Large Language Models Handle Discourse Particles? A Case Study of Colloquial Malay (arxiv.org)

2026-05-28|paper|arxiv

An evaluation examines whether LLMs correctly interpret and generate discourse particles in colloquial Malay, probing low-resource pragmatic language understanding.

KOSPI Surges 100% in 2026 as AI Chip Stocks Trigger Korea’s Biggest Rally in Decades (blocknow.com)

2026-05-28|news|reddit/artificial

News report covering a 100% KOSPI index gain in 2026 driven by surging valuations of Korean AI chip companies.

Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!(huggingface.co)

2026-05-28|news|reddit/LocalLLaMA

A merged model combining multiple Gemma-4-31B-it fine-tunes using deep neural consolidation techniques achieves very low KL divergence (0.0047) and only 9% refusal rate.

Stress disrupts hippocampal integration of overlapping events, memory inference (science.org)

2026-05-28|news|hackernews

Stress impairs the hippocampus's ability to link overlapping experiences, disrupting inferential memory across related events.

DuckDuckGo search saw 28% more visits after Google said people love AI mode (pcgamer.com)

2026-05-28|news|hackernews

DuckDuckGo recorded a 28% visit increase following Google's announcement that users embrace its AI search mode.

Hugging Face / repo link

2026-05-28|model|perplexity

- GitHub repository: `nousresearch/hermes-agent`.[3]

← Prev22 / 172Next →