Develops a framework for recording and analyzing how an agent's behavioral patterns evolve across successive adaptation steps over time.
Automates construction of adversarial attacks that exploit the full skill lifecycle in LLM agent systems to elicit harmful outputs.
Applies speculative decoding to diffusion-based language models using a simpler draft-then-verify scheme to accelerate their token generation.
Trains a self-harm detection model on emergency department triage notes augmented with clinical evidence, enabling transfer across different hospital systems.
Introduces a WER metric that normalizes across multiple Indic scripts by mapping equivalent characters, enabling fair ASR comparison regardless of script choice.
Uses a mixture-density network to represent multiple plausible depth values per pixel, resolving ambiguities that cause flying-point artifacts in monocular depth estimation.
Evaluates LLM ability to induce complex rules from sparse evidence using text-based games structured around a hero's journey narrative framework.
Proposes replacing submodule-level units (attention heads, MLP blocks) rather than full layers as the granularity for LLM compression, improving accuracy-efficiency trade-offs.
Applies verifiable belief-space neural safety filters during inference to guarantee safe robot interactions while minimizing unnecessary constraint conservatism.
Designs an intra-client shuffling mechanism that provides differential privacy guarantees across heterogeneous federated learning clients without a trusted shuffler server.
Provides an interactive, multi-stage EHR simulation environment enabling agents to perform long-horizon clinical decision-making tasks grounded in realistic patient records.
Introduces a predictive visual tokenization scheme that encodes temporal redundancy across video frames into compact codes, improving video multimodal LLM efficiency.
Expands adapter modules guided by class prototypes and consolidates them geometrically to prevent forgetting during sequential multimodal instruction tuning tasks.
Applies perceptual perturbations and reward modeling to reduce systematic visual judgment biases in multimodal LLMs used as evaluators of perceptual quality.
A 1-bit quantized 4B-parameter image generation model optimized to run locally on consumer devices with minimal memory and compute.
A United Airlines 767 diverted back to Newark after a passenger's Bluetooth device name triggered a security alert onboard.
A vulnerability in ChatGPT's Google Sheets integration allows malicious prompts to exfiltrate spreadsheet data to external parties.
An analysis of how AI tools have dramatically accelerated the software prototyping cycle, reducing time from concept to working demo.
An argument that remote work reduced mentorship and visibility for junior employees, explaining weak junior hiring better than AI displacement does.
LangChain provides a framework for building LLM-powered agents and chains, abstracting prompt management, tool use, and memory.
Open WebUI delivers a self-hosted browser interface for interacting with local and API-based LLMs including Ollama and OpenAI-compatible endpoints.
Dify provides a production-ready platform for designing, deploying, and managing agentic LLM workflows with built-in orchestration tooling.
Hugging Face Transformers standardizes model definitions, training, and inference for state-of-the-art NLP and multimodal models across frameworks.
A community-curated repository for sharing and discovering reusable ChatGPT system and user prompts across diverse tasks and personas.
Ollama enables one-command local execution of large language models including Kimi-K2.5, DeepSeek, Qwen, and Gemma on personal hardware.
AutoGPT provides an open platform for building and running autonomous AI agents, targeting accessibility for non-expert users and developers.
Based on available public release notes and news, there are **no clearly documented brand‑new frontier foundation models from Google, Meta, or Microsoft in just the past week** that meet your criteria (new model, significant capabilities, released beyond narrow research prototypes). The most recent major jumps (e.g., new Gemini variants, Llama versions, DeepSeek/Qwen releases) are earlier than this one‑week window, and current search results do not show a fresh model‑class announcement in the la
Your query is “past week,” and OpenAI’s major frontier family steps (GPT‑5.x, o‑series reasoning, open‑weight gpt‑oss models) all fall **earlier than the last 7 days**, based on their own release notes timeline.[1][2][3] Still, since they shape the current frontier landscape: - **GPT‑5.3 / 5.4 series** (Instant, Thinking, Pro, mini) — new flagship work/learning models emphasizing faster web‑integrated reasoning and multi‑step workflows.[1][2][3] - **o‑series reasoning models (o1, o3, 4.5 rese
Anthropic's Claude Opus 4.8 is a frontier large language model release advancing capability, safety, and instruction-following over prior Claude versions.
Introduces a memory mechanism that selectively retains and retrieves task-relevant information for multimodal agents operating across long interaction sequences.