rgb_language_vqa

Spatial-aware vision-language model trained on COCO for visual question-answering.

  • Type: AI Model
  • Key Features: Produces answers describing spatial relationships between objects in an image.
  • Technical Categories: Computer Vision, Natural Language Processing, Vision-Language Models
  • Sectors: Robotics, Surveillance, Spatial Understanding Applications
  • Research areas: Visual Question Answering, Spatial Reasoning
  • Type of License: Apache-2.0

Shopping Basket