Qwen2-VL-72B(huggingface.co)
|model|HuggingFace
Strong vision-language understanding with competitive performance on benchmarks. Supports multiple image inputs and high-resolution processing.
Strong vision-language understanding with competitive performance on benchmarks. Supports multiple image inputs and high-resolution processing.