DiffUNet² combines bidirectional prediction, probabilistic generation, and collaborative discovery into a unified diffusion-UNet framework for analyzing scientific imaging data.
Extends Hinton's Forward-Forward algorithm beyond classification by adapting its layer-wise local learning objective to handle continuous-valued regression targets.
Evicts KV cache entries in reasoning models by weighting eviction decisions according to each token's estimated contribution to the final answer value.
Demonstrates that quadratic integrate-and-fire neurons produce smoother loss landscapes than leaky integrate-and-fire neurons, yielding higher accuracy under spike-based backpropagation.
Uses diffusion posterior sampling conditioned on sparse observations to correct the spectral bias of neural operators that over-smooth high-frequency solution components.
Replaces entropy-based token selection in visual RL with a vision-anchored selection strategy that ties sampled tokens to grounded visual features, improving reasoning performance.
Introduces low-level programming primitives that organize pretraining into hyper-epochs, enabling structured curriculum control and efficient reuse of data across large-scale training runs.
Addresses catastrophic forgetting of older temporal knowledge in federated continual learning by replaying or anchoring to earlier temporal distributions across clients.
Embeds sensor metadata directly into an autoencoder architecture with a single transcoding step, reducing reconstruction overhead for sensor data.
Accelerates ML-based data filters by using lightweight metadata to skip irrelevant data blocks before invoking the learned filter, reducing inference cost.
Presents a compact, offline-capable simultaneous speech translation model designed for low-resource deployment, submitted as the CUNI system to IWSLT 2026.
Deploys a vision-language agent that monitors human activities in real time and flags safety-critical behaviors for embodied or surveillance applications.
Trains ASR models using synthetically generated conversational speech that was never recorded, reducing dependence on real conversational audio corpora.
Reward uncertainty estimates are used to diversify agent policies in RL, encouraging exploration of distinct behavioral modes rather than converging to a single solution.
A UAV navigation system uses agentic RL where the agent iteratively refines its own policy using visual observations without requiring human-labeled correction data.
A steering mechanism controls chain-of-thought length and reasoning paths in LLMs at inference time, trading off computational cost against answer quality.
An adapted AlignAtt attention-based simultaneous speech translation method is extended to decoder-only LLM architectures for the IWSLT 2026 shared task.
A framework jointly designs evaluation queries and scoring rubrics to generate reward signals for RL in tasks where ground-truth verifiable rewards are unavailable.
Metrics are introduced to measure how accurately large reasoning models express calibrated confidence that reflects their actual correctness on reasoning tasks.
A formal computational or mathematical framework is proposed to precisely define the binding problem—how distinct features are combined into unified object representations.
A mechanism is proposed for language models to periodically consolidate and restructure acquired knowledge into long-term memory, analogous to sleep-based memory consolidation.
A unified reward model maps heterogeneous evaluation criteria onto a shared agent-skill representation space, enabling consistent scoring across diverse task types.
Behavioral analysis reveals that language models resolve quantity comparisons by applying separate heuristics tied to specific number formats and unit types rather than grounded numerical reasoning.
A large-scale data and architectural scaling approach enables a humanoid robot controller to track diverse motions zero-shot using GPT-style model capacity.
Perception tokens that encode imagined spatial views are injected into multimodal language models to improve their reasoning about 3D spatial relationships.
Larger neural networks develop neuron subpopulations with increasingly specialized and divergent feature selectivity compared to smaller models, revealing scale-dependent representational heterogeneity.
Stanford's CS336 course teaches students to build language models from scratch, covering architecture, training, and implementation fundamentals.
Guidelines governing how students may use AI agents when completing assignments in Stanford's CS336 language modeling course.
Financial analysis examining whether public equity markets have sufficient capacity to absorb IPOs or valuations of Anthropic, SpaceX, and OpenAI.
OpenAI's frontier models and Codex coding API are now accessible to developers through Amazon Web Services infrastructure.