The AI Wire

High Signal (4-5)clear

3149 articles — page 4 of 105

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?(huggingface.co)

2026-06-04|model|huggingface

Benchmarks frontier language models on long-horizon automated research and engineering workflows to assess end-to-end autonomous problem-solving capability.

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning (huggingface.co)

2026-06-04|model|huggingface

Compresses lengthy chain-of-thought reasoning traces into more compact representations through introspective preference learning over model-generated rationales.

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging (huggingface.co)

2026-06-04|model|huggingface

Introduces a budget-aware model merging method that selectively limits which expert weight subsets each model can read, improving scalability.

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations (huggingface.co)

2026-06-04|model|huggingface

Attributes training data influence by applying sparse recovery techniques to model output changes induced by systematic perturbations of training subsets.

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts (huggingface.co)

2026-06-04|model|huggingface

WebRISE evaluates multimodal LLM-generated web artifacts by checking whether outputs satisfy explicit functional and structural requirements, not just visual similarity.

OpenSTBench: Beyond Semantic Evaluation for Speech Translation (huggingface.co)

2026-06-04|model|huggingface

OpenSTBench introduces evaluation metrics for speech translation that go beyond semantic accuracy, capturing structural, prosodic, or pragmatic translation quality.

Unlocking Feature Learning in Gated Delta Networks at Scale (huggingface.co)

2026-06-04|model|huggingface

Scaling modifications to Gated Delta Networks enable effective feature learning, addressing a previously identified limitation of this recurrent architecture at larger scales.

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation (huggingface.co)

2026-06-04|model|huggingface

A two-stage on-policy distillation method first filters low-quality training samples, then applies per-sample reweighting to improve fine-grained optimization of student models.

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs (huggingface.co)

2026-06-04|model|huggingface

OVO-S-Bench hierarchically benchmarks multimodal LLMs on streaming video spatial reasoning, testing capabilities like depth, layout, and object-relation understanding over temporal sequences.

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems (huggingface.co)

2026-06-04|model|huggingface

RAMP provides a runtime evaluation framework for assessing agentic AI models in live production environments, capturing failure modes invisible to static offline benchmarks.

Uber Caps Usage of AI Tools Like Claude Code to Manage Costs (simonwillison.net)

2026-06-04|news|blog/Simon Willison

Uber imposed usage caps on AI coding tools including Claude Code as a direct cost-control measure following higher-than-expected enterprise spending.

Adding MCP Tools to Reachy Mini (huggingface.co)

2026-06-04|news|blog/Hugging Face Blog

MCP tool integration was added to the Reachy Mini robot, enabling it to invoke external AI-powered tools via the Model Context Protocol.

Direct Preference Optimization Beyond Chatbots (huggingface.co)

2026-06-04|news|blog/Hugging Face Blog

Direct Preference Optimization techniques are applied outside conversational chatbot settings to align generative models in other domains such as code, images, or structured outputs.

OpenAI public policy agenda (openai.com)

2026-06-04|news|blog/OpenAI Blog

OpenAI published its formal public policy agenda outlining its positions on AI regulation, safety standards, and government engagement priorities.

A blueprint for democratic governance of frontier AI (openai.com)

2026-06-04|news|blog/OpenAI Blog

A governance blueprint proposes democratic oversight mechanisms—such as public participation and accountability structures—for decisions made about frontier AI development and deployment.

How Wasmer used Codex to build a Node.js runtime for the edge (openai.com)

2026-06-04|news|blog/OpenAI Blog

Wasmer engineers used OpenAI Codex to accelerate building a Node.js-compatible JavaScript runtime optimized for edge computing environments.

Introducing new capabilities to GPT-Rosalind (openai.com)

2026-06-04|news|blog/OpenAI Blog

GPT-Rosalind receives new capabilities, likely expanding its functionality for biology or genomics-related AI tasks.

Jun 3, 2026PolicyWhat we learned mapping a year’s worth of AI-enabled cyber threats (anthropic.com)

2026-06-04|news|blog/Anthropic News

A year-long empirical mapping of AI-enabled cyber threats yields policy-relevant findings about attack patterns and defensive implications.

Jun 3, 2026AnnouncementsIntroducing the Services Track and Partner Hub of the Claude Partner Network (anthropic.com)

2026-06-04|news|blog/Anthropic News

Anthropic launches a Services Track and Partner Hub to formalize and expand the Claude Partner Network ecosystem.

When does fragmentation occur in the CUDA caching allocator?(docs.pytorch.org)

2026-06-04|news|hackernews

An analysis identifies the conditions and memory allocation patterns that trigger fragmentation in CUDA's caching memory allocator.

Gmail thinks I'm stupid, so I left (moddedbear.com)

2026-06-03|news|hackernews

A user explains switching away from Gmail due to frustration with AI-driven smart features that oversimplify or patronize user interactions.

MAI-Code-1-Flash (microsoft.ai)

2026-06-03|news|hackernews

Microsoft releases MAI-Code-1-Flash, a fast, efficient AI model optimized for code generation tasks.

Trump signs downsized AI order after weeks of reversals (politico.com)

2026-06-03|news|hackernews

Trump signs a reduced-scope executive order on AI policy after earlier, broader versions were revised multiple times.

AI outperforms law professors in Stanford Law study (law.stanford.edu)

2026-06-03|news|hackernews

A Stanford Law study finds AI systems outperform law professors on legal reasoning or analysis tasks tested in the study.

Open Repair Data Standard – Open Repair Alliance (openrepair.org)

2026-06-03|news|hackernews

Open Repair Alliance publishes a standardized open data schema for logging and sharing consumer product repair records across organizations.

How we index images for RAG (kapa.ai)

2026-06-03|news|hackernews

An engineering team describes their pipeline for embedding and indexing images to enable retrieval-augmented generation over visual content.

langchain-ai/langchain (138369 stars): The agent engineering platform.(github.com)

2026-06-03|tool|github

LangChain provides a framework and toolset for building, orchestrating, and deploying AI agents and multi-step LLM pipelines.

open-webui/open-webui (139758 stars): User-friendly AI Interface (Supports Ollama, OpenAI API, ...)(github.com)

2026-06-03|tool|github

Open WebUI delivers a self-hostable browser interface for interacting with local and remote LLMs via Ollama and OpenAI-compatible APIs.

langgenius/dify (143627 stars): Production-ready platform for agentic workflow development.(github.com)

2026-06-03|tool|github

Dify offers a production-grade platform for visually designing, deploying, and managing agentic LLM workflows and applications.

huggingface/transformers (161223 stars): 🤗 Transformers: the model-definition framework for state-of-the-art machine lear (github.com)

2026-06-03|tool|github

Hugging Face Transformers provides a unified Python library for defining, loading, fine-tuning, and running state-of-the-art pretrained models.

← Prev4 / 105Next →