The AI Wire

180 articles tagged "cv" — page 4 of 6

Keypoint Counting Classifiers: Turning Vision Transformers into Self-Explainable Models Without Training [TOP LAB](arxiv.org)

2025-12-22|paper|arXiv

Current approaches for designing self-explainable models (SEMs) require complicated training procedures and specific architectures which makes them impractical. With the advance of general purpose fou...

MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration [TOP LAB](arxiv.org)

2025-12-22|paper|arXiv

Robust mammography registration is essential for clinical applications like tracking disease progression and monitoring longitudinal changes in breast tissue. However, progress has been limited by the...

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing (arxiv.org)

2025-12-22|paper|arXiv

Modern Latent Diffusion Models (LDMs) typically operate in low-level Variational Autoencoder (VAE) latent spaces that are primarily optimized for pixel-level reconstruction. To unify vision generation...

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation [TOP LAB](arxiv.org)

2025-12-21|paper|arXiv

Automating Text-to-Image (T2I) model evaluation is challenging; a judge model must be used to score correctness, and test prompts must be selected to be challenging for current T2I models but not the ...

Generative Refocusing: Flexible Defocus Control from a Single Image (arxiv.org)

2025-12-21|paper|arXiv

Depth-of-field control is essential in photography, but getting the perfect focus often takes several tries or special equipment. Single-image refocusing is still difficult. It involves recovering sha...

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation [TOP LAB](arxiv.org)

2025-12-20|paper|arXiv

Automating Text-to-Image (T2I) model evaluation is challenging; a judge model must be used to score correctness, and test prompts must be selected to be challenging for current T2I models but not the ...

Generative Refocusing: Flexible Defocus Control from a Single Image (arxiv.org)

2025-12-20|paper|arXiv

Depth-of-field control is essential in photography, but getting the perfect focus often takes several tries or special equipment. Single-image refocusing is still difficult. It involves recovering sha...

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation [TOP LAB](arxiv.org)

2025-12-19|paper|arXiv

<think>

Generative Refocusing: Flexible Defocus Control from a Single Image (arxiv.org)

2025-12-19|paper|arXiv

<think>

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives (arxiv.org)

2025-12-17|paper|arXiv

<think>

Enhancing Visual Sentiment Analysis via Semiotic Isotopy-Guided Dataset Construction [TOP LAB](arxiv.org)

2025-12-17|paper|arXiv

<think>

FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management Applications [TOP LAB](arxiv.org)

2025-12-17|paper|arXiv

<think>

DBT-DINO: Towards Foundation model based analysis of Digital Breast Tomosynthesis [TOP LAB](arxiv.org)

2025-12-16|paper|arXiv

<think>

DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders (arxiv.org)

2025-12-16|paper|arXiv

<think>

cs-CV cs-AI cs-GR

Stochastics of shapes and Kunita flows [TOP LAB](arxiv.org)

2025-12-15|paper|arXiv

<think>

On Geometric Understanding and Learned Data Priors in VGGT [TOP LAB](arxiv.org)

2025-12-15|paper|arXiv

<think>

Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance (arxiv.org)

2025-12-15|paper|arXiv

<think>

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World (arxiv.org)

2025-12-14|paper|arXiv

<think>

Any4D: Unified Feed-Forward Metric 4D Reconstruction [TOP LAB](arxiv.org)

2025-12-14|paper|arXiv

<think>

cs-CV cs-AI cs-LG

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space (arxiv.org)

2025-12-14|paper|arXiv

<think>

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World (arxiv.org)

2025-12-13|paper|arXiv

<think>

Any4D: Unified Feed-Forward Metric 4D Reconstruction [TOP LAB](arxiv.org)

2025-12-13|paper|arXiv

<think>

cs-CV cs-AI cs-LG

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space (arxiv.org)

2025-12-13|paper|arXiv

<think>

Any4D: Unified Feed-Forward Metric 4D Reconstruction [TOP LAB](arxiv.org)

2025-12-12|paper|arXiv

<think>

cs-CV cs-AI cs-LG

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space (arxiv.org)

2025-12-12|paper|arXiv

<think>

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World (arxiv.org)

2025-12-12|paper|arXiv

<think>

ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning (arxiv.org)

2025-12-11|paper|arXiv

<think>

Hands-on Evaluation of Visual Transformers for Object Recognition and Detection [TOP LAB](arxiv.org)

2025-12-11|paper|arXiv

<think>

Astra: General Interactive World Model with Autoregressive Denoising (arxiv.org)

2025-12-10|paper|arXiv

<think>

cs-CV cs-AI cs-LG

Tri-Bench: Stress-Testing VLM Reliability on Spatial Reasoning under Camera Tilt and Object Interference [TOP LAB](arxiv.org)

2025-12-10|paper|arXiv

<think>

← Prev4 / 6Next →