Firecrawl provides web search, scraping, and content-cleaning capabilities purpose-built for feeding structured data to AI agents.
Dify is a production-ready platform for designing, deploying, and managing agentic workflows that combine LLMs with tools and data sources.
MUSE-Autoskill enables agents to autonomously create, store, manage, and evaluate reusable skills, allowing continuous self-improvement without human intervention.
LocateAnything accelerates vision-language grounding by decoding bounding boxes in parallel rather than sequentially, improving both speed and localization quality.
A method extracts latent capacity from multimodal LLMs to generate images conditioned on specific subjects without requiring dedicated subject-driven generation architectures.
A transformer architecture uses masked region modeling to generate and edit images as separate compositable layers at scale.
Incorporates surface normal vectors as guidance signals into attention mechanisms to improve geometry-aware feature learning in vision models.
Shows that vision-language models judge concreteness and imagery properties less accurately on real photographs than on synthetic or abstract stimuli.
Conditions diffusion-based image generation on learned internal representations to enable fine-grained, controllable synthesis of image content.
China is restricting international travel for AI researchers employed at Alibaba and DeepSeek, tightening controls on the movement of AI talent abroad.
An inside account reveals the internal review and approval process Alibaba's Qwen team follows before publicly releasing open-source model weights.
A system or framework for automatically evaluating the quality and novelty of scientific research ideas is introduced.
Ollama provides a local runtime to download and run large language models including Kimi-K2.5, GLM-5, MiniMax, DeepSeek, and Qwen on personal hardware.
Mono-Anchored Advantage Normalization addresses whether additional visual inputs genuinely improve reasoning by normalizing advantage estimates to a single-source anchor.
A security report details how Microsoft Copilot's Cowork feature can be exploited to exfiltrate files from user environments.
Provides a standalone framework that quantifies, simulates, and visualizes technical debt and stochastic cost accumulation specific to multi-agent AI systems.
Defines a framework for governing AI agent runtime behavior through formally executable cognitive policies that constrain and direct agent actions.
Introduces an inline safety harness that monitors and enforces lifecycle-level constraints on LLM-based finance agents during execution.
Derives Gibbs-correction terms from learned score functions to accelerate sampling in uniform-rate discrete diffusion models without retraining.
Achieves fast, high-quality visual grounding by decoding bounding boxes in parallel rather than sequentially, accelerating vision-language object localization.
An uncensored Qwen3.5 35B A3B variant with all 785 Multi-Token Prediction heads preserved is released in Safetensors, GGUF, NVFP4, and GPTQ-Int4 formats.
A technique converts standard local AI agents into ones that iteratively improve their own behavior through self-optimization feedback loops.
LangChain offers a framework and tooling for building, orchestrating, and deploying LLM-powered agents and multi-step reasoning pipelines.
Open WebUI delivers a self-hosted, user-friendly chat interface compatible with locally-run Ollama models and remote OpenAI-compatible APIs.
AutoGPT is an open-source platform enabling non-technical users to create and deploy autonomous AI agents without writing code.
Soap2Soap remakes long cinematic videos by coordinating multiple specialized agents that collaboratively handle narrative, style, and temporal consistency across extended sequences.
A unified evaluation framework benchmarks minute-scale audio-visual generation across text-to-AV, image-to-AV, and video-to-AV tasks.
Evaluates LLMs' knowledge of culturally specific aesthetic and stylistic conventions across multiple cultures to quantify cross-cultural awareness gaps.
Designs user incentive schemes that trade off inference accuracy and latency to shift AI workloads toward low-carbon energy availability windows.
Applies ratio-monotone transforms to probabilistically smooth non-convex objective landscapes, enabling more reliable convergence in global optimization problems.