Automatically generates reusable AI agent skills by distilling knowledge from human experts, reducing manual skill engineering for complex task pipelines.
Provides an automated auditing framework that evaluates and surfaces gaps, redundancies, or failures within the open skill ecosystem available to LLM-based agents.
An autoencoder architecture that takes full input, produces residual outputs, and uses a projection pursuit encoder to learn compact, disentangled latent representations.
A zero-shot speech synthesis system that generates expressive, long-form audio for both monologue and multi-speaker dialogue without speaker-specific training data.
Generates spatially positioned, synchronized audio in a streaming fashion using an autoregressive diffusion transformer that produces multichannel spatial audio in real time.
Systematically evaluates long-form speech generation systems across diverse scenarios including different speaking styles, domains, and acoustic conditions to expose failure modes.
Uses frequency-domain decomposition and sub-frequency manifold traversal to guide a diffusion model for generating temporally coherent and smooth action sequences.
Analyzes when Markov boundary feature selection helps, hurts, or produces mixed results for tabular prediction tasks, clarifying its practical reliability.
Scales human motion generation by conditioning on any combination of input modalities using masked modeling, enabling flexible multimodal control over generated motions.
A general-purpose counting model that estimates the quantity of arbitrary object categories in images based on open-vocabulary or user-specified targets.
Uses on-policy data generated during RLHF training to self-supervisedly improve reward model accuracy, addressing reward model degradation caused by policy distribution shift.
Trains agents on open-ended tasks through self-play where multiple policies co-evolve together, generating increasingly challenging and diverse training signal without human-designed curricula.
Evaluates whether vision-language models can reliably abstain from answering spatial questions they lack sufficient visual information to answer correctly, diagnosing failure modes.
Introduces a benchmark and synthetic trajectory generation method for training GUI agents to recover from their own policy-induced errors during task execution.
An investigation into issues or behavior observed in the pydantic-monty library, likely examining bugs, unexpected functionality, or security concerns.
A personal account arguing that cancelling an AI subscription was the right practical or financial decision, weighing real utility against cost.
Release notes for version 1.0a32 of Datasette, the open-source tool for exploring and publishing SQLite databases, detailing new features or fixes.
A monthly newsletter from May 2026 summarizing recent developments, projects, or curated content relevant to the author's focus area.
Explains that a running process's memory is exposed as a file on disk via interfaces like /proc/pid/mem, illustrating Unix's everything-is-a-file design.
NVIDIA releases Cosmos 3, an open multimodal model designed to support physical AI systems by integrating reasoning and action planning across modalities.
Traces the origin and cultural journey of Adriano Celentano's 1972 nonsense-lyric song deliberately composed to mimic American English sounds without meaning.
The Vera Rubin Observatory has detected both very large near-Earth asteroids and failed supernova candidates (stars that collapse without a visible explosion).
Argues that criteria used to attribute human-like properties to LLMs are so broad they would equally apply to Age of Empires II, exposing the criteria as flawed.
Identifies a mechanistic link between large activation outliers in sparse autoencoders and the phenomenon where learned features permanently stop firing during training.
A hybrid CNN and CodeBERT model classifies source code tokens into three categories: real secrets, placeholder credentials, and non-credentials, to reduce false-positive leak alerts.
Extends speech-focused audio tokenizers with general audio perception capabilities so a single tokenizer handles diverse audio types without losing speech semantic quality.
Integrates and harmonizes transcriptomic response data from multiple small-molecule perturbation experiments into a unified, consistently formatted compendium for downstream analysis.
Establishes that reinforcement learning value functions satisfying supermartingale conditions serve as formal safety and stability certificates for stochastic dynamical systems.
Uses grammar-constrained symbolic regression to automatically infer dissipation potential functions that are guaranteed to satisfy thermodynamic admissibility constraints from data.
Optimizes visual features extracted from input images to adaptively guide 3D scene reconstruction, improving quality under varying scene conditions.