A scalable self-supervised framework synthesizes novel views from real-world video by leveraging ray-based representations without requiring posed multi-view supervision.
Trains a model on preoperative CT scans to automatically predict which patients will develop pancreatic fistula after surgery, enabling earlier risk stratification.
Learns personalized evaluation rubrics that align automated scoring with individual human evaluator preferences rather than a single universal standard.
Quantifies and characterizes the gap between dynamic trajectory accuracy and probabilistic distributional consistency in surrogate models trained on chaotic systems.
A protocol restores corrupted or incomplete semantic triplets (subject-relation-object) to train LLMs for hierarchical table structure understanding.
Vision-language models systematically reduce female representation in generated or retrieved outputs when input prompts are gender-ambiguous.
Attention heads are categorized as positional or symbolic, analyzing how RoPE geometry shapes their learning dynamics and ability to generalize to longer sequences.
Attention is reformulated to produce functional correspondences between continuous representations rather than discrete pairwise similarity scores between token pairs.
Question-answering tasks are used as probes to reveal what information is encoded or absent in hidden states of language models.
Masked gene expression pretraining learns biologically meaningful representations by forcing models to reconstruct randomly masked expression values.
Classification labels and rationale-based explanations in hate speech detection are shown to frequently disagree, challenging standard explainability evaluation practices.
Trajectory analysis of discrete diffusion models reveals which graph elements (nodes, edges, labels) get unmasked earliest during graph-to-text generation.
SPECTRA generates synthetic information retrieval test collections with ground-truth relevance oracles and controlled distractor documents for diagnostic evaluation of IR systems.
A multimodal Joint Embedding Predictive Architecture encodes heterogeneous sensor time-series into semantically meaningful shared embeddings without requiring labeled data.
A system strategically selects which argumentative perspective to activate based on conversational context to construct more effective context-dependent arguments.
LLMs are fine-tuned for long-context reasoning using search agent trajectories as training data, with rubric-based rewards guiding reinforcement learning.
Probes whether language models encode construction-grammar meaning (e.g., paired-focus constructions like 'not to mention') beyond surface syntax, evaluating grammatical versus semantic understanding.
Steers a pretrained Diffusion Transformer at inference time using progressive guidance signals to generate videos containing multiple distinct events without any additional training.
Presents a stateful runtime monitor that detects adversarial attacks on distributed multi-agent systems by tracking agent state sequences rather than inspecting individual messages in isolation.
Derives tight convergence bounds for error-feedback gradient compression algorithms in distributed optimization, closing gaps between existing upper and lower complexity analyses.
Detects localized distribution shifts in inverse-problem measurements by comparing KL-divergence between observed data and diffusion-model-learned priors over spatial regions.
Unifies video understanding and generation in a single model by bridging low- and high-frequency temporal signals within a shared homogeneous latent space for efficiency.
Argues that deep domain knowledge remains a defensible competitive advantage over generic AI capabilities in specialized industries.
OpenRouter, the unified LLM API routing platform, secured $113 million in Series B funding to scale its infrastructure.
ByteDance's open-source DeerFlow agent autonomously conducts multi-step research, writes and executes code, and generates long-form content.
Daytona provides sandboxed, scalable infrastructure for safely executing AI-generated code in isolated, production-grade environments.
A structured checklist covering modern front-end development best practices, usable by both human developers and AI coding agents.
claude-mem adds persistent memory to Claude-based agents by storing and retrieving context across separate conversation sessions.
RAGFlow is an open-source RAG engine enabling document ingestion, retrieval, and generation pipelines for building knowledge-grounded AI applications.
Google's open-source CLI agent integrates Gemini models directly into the terminal for interactive coding, file manipulation, and task automation.