The AI Wire

5101 articles — page 4 of 171

Uber's $1,500/month AI limit is a useful signal for AI tool pricing (simonwillison.net)

2026-06-04|news|hackernews

Uber's $1,500/month per-employee AI spending cap provides a concrete reference point for companies evaluating and setting AI tool budget limits.

Mathematicians issue warning as AI rapidly gains ground (science.org)

2026-06-04|news|hackernews

Mathematicians are publicly cautioning that AI systems are advancing into mathematical reasoning rapidly enough to warrant concern from the professional community.

Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes (dailycal.org)

2026-06-04|news|hackernews

Berkeley CS courses are seeing rising failure rates and measurable declines in foundational math ability correlated with increased student AI tool usage.

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it (kasra.blog)

2026-06-04|news|hackernews

An intentionally vulnerable application was built and $1,500 was spent prompting multiple LLMs to attempt exploitation, empirically testing their offensive security capabilities.

langchain-ai/langchain (138460 stars): The agent engineering platform.(github.com)

2026-06-04|tool|github

LangChain provides a framework and tooling for composing LLM-powered agents, chains, and tool-use pipelines for production and experimental applications.

open-webui/open-webui (139930 stars): User-friendly AI Interface (Supports Ollama, OpenAI API, ...)(github.com)

2026-06-04|tool|github

Open WebUI delivers a self-hostable browser interface for interacting with local models via Ollama and remote models via OpenAI-compatible APIs.

langgenius/dify (143771 stars): Production-ready platform for agentic workflow development.(github.com)

2026-06-04|tool|github

Dify is a production-grade platform for building, deploying, and managing agentic LLM workflows with visual tooling and backend infrastructure included.

huggingface/transformers (161257 stars): 🤗 Transformers: the model-definition framework for state-of-the-art machine lear (github.com)

2026-06-04|tool|github

Hugging Face Transformers provides standardized model definitions, loading, and fine-tuning interfaces for thousands of pretrained state-of-the-art models.

f/prompts.chat (163278 stars): f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the co (github.com)

2026-06-04|tool|github

Prompts.chat is a community-driven repository for sharing, discovering, and collecting reusable ChatGPT and LLM prompts across diverse use cases.

ollama/ollama (173112 stars): Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Ge (github.com)

2026-06-04|tool|github

Ollama enables local downloading and execution of large language models including Kimi-K2.6, DeepSeek, Qwen, and others with minimal setup.

Significant-Gravitas/AutoGPT (184735 stars): AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our (github.com)

2026-06-04|tool|github

AutoGPT is an open platform for building and running autonomous AI agents, targeting accessible deployment of goal-directed LLM-based automation for general users.

The ways we contain Claude across products (anthropic.com)

2026-06-04|news|hackernews

Describes Anthropic's technical and policy mechanisms for scoping and restricting Claude's capabilities and behaviors across different product deployments.

4. Open‑weight reasoning models gpt‑oss‑120b & gpt‑oss‑20b – OpenAI

2026-06-04|model|perplexity

These are not in the search result snippet’s “new in the last 7 days” window explicitly, but the same OpenAI release‑notes entry shows **two open‑weight models** that match your “significant open‑source‑like model” criteria.

Model(s) & org

2026-06-04|model|perplexity

- **OpenAI frontier models** (GPT‑5 generation and related) now available via **Amazon Bedrock**, plus - **Codex** (OpenAI’s software‑engineering agent) as an AWS‑native service[4][5]. - Organization: **OpenAI**, in partnership with **AWS**[4].

3. OpenAI frontier models on AWS (incl. Codex) – OpenAI & AWS

2026-06-04|model|perplexity

This is not a *new* model family, but a **major new deployment channel** for multiple frontier models and the Codex agent.

2. Claude Opus 4.8 – Anthropic

2026-06-04|model|perplexity

Releases Claude Opus 4.8, an updated version of Anthropic's Opus-tier language model with improved capabilities.

1. GPT‑5.5 Instant update – OpenAI

2026-06-04|model|perplexity

Releases GPT-5.5 Instant, a faster or more efficient variant of OpenAI's GPT-5.5 model optimized for lower-latency inference.

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation (huggingface.co)

2026-06-04|model|huggingface

Generates 3D meshes autoregressively by using sparse voxel structures to guide a surface-weaving process that produces clean mesh topology.

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification (huggingface.co)

2026-06-04|model|huggingface

Creates executable symbolic environments that run financial reporting logic to verify structured audit claims against computable rules.

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation (huggingface.co)

2026-06-04|model|huggingface

Provides an industrial-strength agentic pipeline that generates lane-level map data at city scale for autonomous driving applications.

M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks (huggingface.co)

2026-06-04|model|huggingface

Evaluates multi-modal memory systems using video-based tasks grounded in cognitive science frameworks for assessing memory retention and retrieval.

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation (huggingface.co)

2026-06-04|model|huggingface

Distills a multi-step video generation model into a one-step autoregressive model using an asymmetric adversarial training objective.

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?(huggingface.co)

2026-06-04|model|huggingface

Benchmarks frontier language models on long-horizon automated research and engineering workflows to assess end-to-end autonomous problem-solving capability.

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning (huggingface.co)

2026-06-04|model|huggingface

Compresses lengthy chain-of-thought reasoning traces into more compact representations through introspective preference learning over model-generated rationales.

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging (huggingface.co)

2026-06-04|model|huggingface

Introduces a budget-aware model merging method that selectively limits which expert weight subsets each model can read, improving scalability.

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations (huggingface.co)

2026-06-04|model|huggingface

Attributes training data influence by applying sparse recovery techniques to model output changes induced by systematic perturbations of training subsets.

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts (huggingface.co)

2026-06-04|model|huggingface

WebRISE evaluates multimodal LLM-generated web artifacts by checking whether outputs satisfy explicit functional and structural requirements, not just visual similarity.

OpenSTBench: Beyond Semantic Evaluation for Speech Translation (huggingface.co)

2026-06-04|model|huggingface

OpenSTBench introduces evaluation metrics for speech translation that go beyond semantic accuracy, capturing structural, prosodic, or pragmatic translation quality.

Unlocking Feature Learning in Gated Delta Networks at Scale (huggingface.co)

2026-06-04|model|huggingface

Scaling modifications to Gated Delta Networks enable effective feature learning, addressing a previously identified limitation of this recurrent architecture at larger scales.

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation (huggingface.co)

2026-06-04|model|huggingface

A two-stage on-policy distillation method first filters low-quality training samples, then applies per-sample reweighting to improve fine-grained optimization of student models.

← Prev4 / 171Next →