A training method uses contrastive comparison between correct and incorrect reasoning traces to rapidly steer models toward better multi-step reasoning without extensive data.
Gradient signals from concept-decomposed representations are used as probes to identify dataset biases without requiring any bias labels.
browser-use provides a library enabling AI agents to control and automate real browser interactions with websites.
Firecrawl provides an API and toolset to search, scrape, and structure web content into clean data for AI agents.
Dify enables developers to build, deploy, and monitor LLM-powered agentic workflows in production environments with a visual development platform.
Applies block-level diffusion within a vision-language model for autonomous driving to achieve faster inference while maintaining high-quality scene understanding and planning.
Introduces Word Coverage Score (WCS) to measure how many lexical tokens an LLM can actually generate under sampling, revealing vocabulary blind spots.
Extrapolating beyond averaged weight checkpoints in code-generation RL training exposes Pareto frontiers trading off solution correctness against computational efficiency.
An open-vocabulary 3D generative model lets users specify arbitrary part-level semantic controls to synthesize and manipulate distinct components of 3D objects.
A policy optimization method trains multimodal agents through exploratory interaction, improving agentic reasoning across visual and textual decision-making tasks.
An automated pipeline identifies weaknesses in small computer-use agents and generates domain-specific training data to improve performance in those failure areas.
A self-distillation method conditions LLM training on identified skill categories, selectively reinforcing reasoning capabilities where the model shows specific weaknesses.
Applies causal state space models to perform low-latency, continuous EEG signal decoding directly from streaming brain activity without future context.
Trains separate explanation models per annotator by treating disagreements among human labelers as meaningful signal via cross-annotator preference optimization.
Proposes a method to tune how conservatively an AI system behaves under uncertainty, enabling safer scalable oversight without sacrificing performance.
Ollama enables local download, quantization management, and inference serving of large language models including Qwen, DeepSeek, and Gemma via a CLI and API.
Incorporates generative model supervision signals to improve embodied agent learning, enabling better scene understanding and action planning in physical environments.
Extends large multimodal models with creative physical reasoning capabilities, enabling generation and understanding of physically plausible, imaginative real-world scenarios.
Exact closed-form formulas are derived for preference-weighted expected hypervolume improvement and R2 improvement scalarizations, and their monotonicity properties in multi-objective Bayesian optimization are proven.
A framework routes tasks to specialized AI agents in a decentralized network using incentive-aligned mechanisms to match skills to appropriate subtasks.
Memory in neural systems is reframed as dynamically evolving connectivity patterns rather than static storage, enabling continuous adaptation of stored associations.
A study identifies and analyzes the performance gap between vision-language models and humans on causal reasoning tasks requiring abstract rather than perceptual understanding.
Extracts and aggregates fine-grained visual attributes from CLIP features to prevent catastrophic forgetting when learning new classes incrementally.
Demonstrates that adding visual modality to LLMs does not consistently improve alignment with human reading behavior measured during natural text comprehension.
LangChain provides a framework for composing LLMs, tools, and memory into chains and agents for building AI-powered applications.
Open WebUI delivers a self-hosted browser interface for interacting with local and remote LLMs including Ollama and OpenAI-compatible APIs.
AutoGPT provides an open-source autonomous agent platform that chains GPT model calls with tool use to complete long-horizon tasks with minimal human input.
Provides a coordination and policy substrate that manages communication, task allocation, and decision-making protocols across multiple collaborating AI agents.
Presents a system that automatically discovers and iteratively refines reusable conversational skills to improve emotional support dialogue agents.
Codex-powered tax agents iteratively improve their own code and reasoning to handle increasingly complex tax filing and computation tasks autonomously.