Vision and Language Models

Focuses on AI models that combine vision and language understanding for XR.

  • Type: Repository
  • Key Features: Development of models for tasks like image captioning.
  • Technical Categories: Computer Vision, Natural Language Processing, Vision-Language Models, Multimodal Learning
  • Sectors: Entertainment, Education, Industrial Applications (XR)
  • Research areas: Visual Scene Understanding, Image Captioning, Spatial Reasoning
  • Type of License: Apache-2.0

Shopping Basket