The AI Wire

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care [TOP LAB](arxiv.org)

2025-12-27|paper|arXiv

Large language models (LLMs) often match or exceed clinician-level performance on medical benchmarks, yet very few are evaluated on real clinical data or examined beyond headline metrics. We present, ...

cs-AI

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care [TOP LAB](arxiv.org)

2025-12-26|paper|arXiv

Large language models (LLMs) often match or exceed clinician-level performance on medical benchmarks, yet very few are evaluated on real clinical data or examined beyond headline metrics. We present, ...

cs-AI

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care [TOP LAB](arxiv.org)

2025-12-25|paper|arXiv

Large language models (LLMs) often match or exceed clinician-level performance on medical benchmarks, yet very few are evaluated on real clinical data or examined beyond headline metrics. We present, ...

cs-AI

LongVideoAgent: Multi-Agent Reasoning with Long Videos (arxiv.org)

2025-12-24|paper|arXiv

Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summa...

cs-AI cs-CV cs-LG

Benchmarking LLMs for Predictive Applications in the Intensive Care Units [TOP LAB](arxiv.org)

2025-12-24|paper|arXiv

With the advent of LLMs, various tasks across the natural language processing domain have been transformed. However, their application in predictive tasks remains less researched. This study compares ...

cs-AI

Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks [TOP LAB](arxiv.org)

2025-12-24|paper|arXiv

As networks evolve toward 5G Standalone and 6G, operators face orchestration challenges that exceed the limits of static automation and Deep Reinforcement Learning. Although Large Language Model (LLM)...

cs-AI cs-NI

Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis [TOP LAB](arxiv.org)

2025-12-23|paper|arXiv

Diabetic retinopathy (DR) is a leading cause of preventable blindness worldwide, demanding accurate automated diagnostic systems. While general-domain vision-language models like Contrastive Language-...

cs-CV cs-AI

Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight (arxiv.org)

2025-12-23|paper|arXiv

Automating the calculation of clinical risk scores offers a significant opportunity to reduce physician administrative burden and enhance patient care. The current standard for evaluating this capabil...

cs-AI stat-AP

airflow (github.com)

2025-12-22|tool|GitHub

**Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows as code, particularly suited for data pipelines and ETL processes.**[6][5]

airflow apache apache-airflow python

MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration [TOP LAB](arxiv.org)

2025-12-22|paper|arXiv

Robust mammography registration is essential for clinical applications like tracking disease progression and monitoring longitudinal changes in breast tissue. However, progress has been limited by the...

cs-CV cs-AI

mlflow (github.com)

2025-12-21|tool|GitHub

MLflow is an open‑source platform for managing the end‑to‑end machine learning lifecycle — experiment tracking, reproducible project packaging, model packaging and deployment, and model governance/ver...

machine-learning ai ml mlflow

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation [TOP LAB](arxiv.org)

2025-12-21|paper|arXiv

Automating Text-to-Image (T2I) model evaluation is challenging; a judge model must be used to score correctness, and test prompts must be selected to be challenging for current T2I models but not the ...

cs-CV cs-AI

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs [TOP LAB](arxiv.org)

2025-12-21|paper|arXiv

Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decompose the t...

cs-CL cs-AI

pytorch-forecasting (github.com)

2025-12-20|tool|GitHub

**PyTorch-Forecasting is a Python library built on PyTorch for scalable probabilistic time series forecasting using deep learning models.**[3]

pytorch forecasting gpu uncertainty

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation [TOP LAB](arxiv.org)

2025-12-20|paper|arXiv

Automating Text-to-Image (T2I) model evaluation is challenging; a judge model must be used to score correctness, and test prompts must be selected to be challenging for current T2I models but not the ...

cs-CV cs-AI

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs [TOP LAB](arxiv.org)

2025-12-20|paper|arXiv

Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decompose the t...

cs-CL cs-AI

pixeltable (github.com)

2025-12-19|tool|GitHub

**Pixeltable is an AI data infrastructure platform that provides a declarative, incremental table-based interface for managing and processing multimodal data in AI/ML workflows, eliminating the need f...

llm genai computer-vision ai

mlflow (github.com)

2025-12-19|tool|GitHub

**MLflow is an open-source platform designed to manage the complete machine learning (ML) lifecycle, including experimentation, reproducibility, deployment, and monitoring, while integrating with popu...

machine-learning ai ml mlflow

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation [TOP LAB](arxiv.org)

2025-12-19|paper|arXiv

<think>

cs-CV cs-AI

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs [TOP LAB](arxiv.org)

2025-12-19|paper|arXiv

<think>

cs-CL cs-AI

mlflow (github.com)

2025-12-16|tool|GitHub

**MLflow is an open-source platform designed to manage the complete machine learning (ML) lifecycle, including experiment tracking, model packaging, deployment, and productionization for traditional M...

machine-learning ai ml mlflow

DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders (arxiv.org)

2025-12-16|paper|arXiv

<think>

cs-CV cs-AI cs-GR

Causal Inference in Energy Demand Prediction [TOP LAB](arxiv.org)

2025-12-15|paper|arXiv

<think>

cs-AI

Atomic Action Slicing: Planner-Aligned Options for Generalist VLA Agents [TOP LAB](arxiv.org)

2025-12-15|paper|arXiv

<think>

cs-LG cs-AI cs-RO

moabb (github.com)

2025-12-14|tool|GitHub

**MOABB stands for Mother of A Thousand Brains, an open-source Python toolbox for benchmarking machine learning algorithms on brain-computer interface (BCI) data, particularly electroencephalography (...

brain-computer-interface machine-learning eeg neuroscience

Any4D: Unified Feed-Forward Metric 4D Reconstruction [TOP LAB](arxiv.org)

2025-12-14|paper|arXiv

<think>

cs-CV cs-AI cs-LG

Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation [TOP LAB](arxiv.org)

2025-12-14|paper|arXiv

<think>

cs-CL cs-AI

Any4D: Unified Feed-Forward Metric 4D Reconstruction [TOP LAB](arxiv.org)

2025-12-13|paper|arXiv

<think>