Focuses on AI models that combine vision and language understanding for XR.
- Type: Repository
- Key Features: Development of models for tasks like image captioning.
- Technical Categories: Computer Vision, Natural Language Processing, Vision-Language Models, Multimodal Learning
- Sectors: Entertainment, Education, Industrial Applications (XR)
- Research areas: Visual Scene Understanding, Image Captioning, Spatial Reasoning
- Type of License: Apache-2.0