Endava restructures its software development lifecycle by delegating discrete engineering tasks to autonomous AI agents, reducing human bottlenecks in delivery pipelines.
CERN's Castor system provides a large-scale hierarchical storage management solution for archiving and retrieving the massive data volumes produced by particle physics experiments.
The Pentagon operates AI-generated influence operations producing Spanish-language propaganda targeting Latin American populations to shape geopolitical narratives.
A post-training method internalizes multi-agent debate into a single model's latent space, enabling self-refinement without requiring multiple separate model instances at inference.
Describes and analyzes specific AI experiments conducted in the game of Go, clarifying the design choices and outcomes behind notable research milestones.
Magenta RealTime 2 releases open, locally runnable generative music models capable of producing and responding to live musical input in real time.
Fine-tunes a large language model on vintage technical writing corpora to reproduce the terse, structured documentation style characteristic of 1990s software manuals.
Gemma 4 12B processes both text and images within a single encoder-free architecture, unifying multimodal understanding without separate vision encoders.
Uber's $1,500/month per-employee AI spending cap provides a concrete reference point for companies evaluating and setting AI tool budget limits.
Mathematicians are publicly cautioning that AI systems are advancing into mathematical reasoning rapidly enough to warrant concern from the professional community.
Berkeley CS courses are seeing rising failure rates and measurable declines in foundational math ability correlated with increased student AI tool usage.
An intentionally vulnerable application was built and $1,500 was spent prompting multiple LLMs to attempt exploitation, empirically testing their offensive security capabilities.
LangChain provides a framework and tooling for composing LLM-powered agents, chains, and tool-use pipelines for production and experimental applications.
Open WebUI delivers a self-hostable browser interface for interacting with local models via Ollama and remote models via OpenAI-compatible APIs.
Dify is a production-grade platform for building, deploying, and managing agentic LLM workflows with visual tooling and backend infrastructure included.
Hugging Face Transformers provides standardized model definitions, loading, and fine-tuning interfaces for thousands of pretrained state-of-the-art models.
Prompts.chat is a community-driven repository for sharing, discovering, and collecting reusable ChatGPT and LLM prompts across diverse use cases.
Ollama enables local downloading and execution of large language models including Kimi-K2.6, DeepSeek, Qwen, and others with minimal setup.
AutoGPT is an open platform for building and running autonomous AI agents, targeting accessible deployment of goal-directed LLM-based automation for general users.
Describes Anthropic's technical and policy mechanisms for scoping and restricting Claude's capabilities and behaviors across different product deployments.
These are not in the search result snippet’s “new in the last 7 days” window explicitly, but the same OpenAI release‑notes entry shows **two open‑weight models** that match your “significant open‑source‑like model” criteria.
- **OpenAI frontier models** (GPT‑5 generation and related) now available via **Amazon Bedrock**, plus - **Codex** (OpenAI’s software‑engineering agent) as an AWS‑native service[4][5]. - Organization: **OpenAI**, in partnership with **AWS**[4].
This is not a *new* model family, but a **major new deployment channel** for multiple frontier models and the Codex agent.
Releases Claude Opus 4.8, an updated version of Anthropic's Opus-tier language model with improved capabilities.
Releases GPT-5.5 Instant, a faster or more efficient variant of OpenAI's GPT-5.5 model optimized for lower-latency inference.
Generates 3D meshes autoregressively by using sparse voxel structures to guide a surface-weaving process that produces clean mesh topology.
Creates executable symbolic environments that run financial reporting logic to verify structured audit claims against computable rules.
Provides an industrial-strength agentic pipeline that generates lane-level map data at city scale for autonomous driving applications.
Evaluates multi-modal memory systems using video-based tasks grounded in cognitive science frameworks for assessing memory retention and retrieval.
Distills a multi-step video generation model into a one-step autoregressive model using an asymmetric adversarial training objective.