VOXReality’s goal is to conduct research and develop new AI models to drive future XR interactive experiences, and to deliver these models to the wider European market. These new models will address human-to-human interaction in unidirectional (theatre) and bidirectional (conference) settings, as well as human-to-machine interaction by building the next generation of personal assistants.
VOXReality will develop large-scale self-supervised models that will be fine-tuned to specific downstream tasks with minimal re-training. At the same time, we will rely on modern training approaches for designing models that include subnetworks with a common
representation power but are more targeted towards specific architectures. By leveraging the once-model-for-all concept from the model training (large-scale self-supervision) and deployment (jointly learning sub-networks) perspective, we will be able to provide a catalogue of highly generic models with high representation capacity that will be efficiently specialized for downstream tasks.
In the VOXReality project, publicly available datasets will be used to develop, explore and train AI self-supervised models. Regarding the speech translation, bilingual sentence pairs and multilingual speech translation corpus will be extensively used to build robust end-to-end systems. Additional datasets will be collected and utilized for each pilot to fine-tune the self-supervised models.
Overall, the ambition of VOXReality is to extend vision-language models to simultaneously capture spatial and semantic relationships, a critical requirement for achieving next-generation language-driven XR, a technology which encompasses spatial understanding.
Our consortium has been carefully assembled to include the scientific and technological skills needed to address the challenges of the project and achieve the ambitious goals. It comprises 10 organizations from 5 different EU countries (Greece, Germany, Italy, Ireland and Netherlands).
VOXReality partners have been strategically selected to balance research, innovation and dissemination potential. Coordinated by MAG the team consists of one European Conference Organizer (VRDAYS), an acknowledged cultural theatre (AF), four innovative SMEs (SYN, HOLO, ADAPT, F6S), two combined language and vision
Research Partners (CERTH, UM) and social XR experts (NWO-I). The user partners of the consortium are companies providing solutions to virtual conferencing, cultural sector and industrial applications.
VOXReality’s Open Call is supported by F6S which has a wide network around business innovation, support and community building. VOXReality will be supporting the third-party contributions from a technical (SYN) and networking (F6S) point of view. Finally, acknowledging the importance of applying a user-centric design process, and more specifically for the collection of user requirements and the validation of the developed models, VOXReality has a dedicated partner with expertise in this domain (NWO-I).