Blog post – VOXReality

Copy of Maastricht General Assembly (Instagram Post) (1)

VOXReality Launches Project Results Catalogue

29 May 2025

by Ana R Blog post

The VOXReality project has launched its new Project Results Catalogue, an online collection now available on the project’s website: https://voxreality.eu/project-results/. This catalogue highlights the project’s advancements and contributions to voice interaction in XR.

This comprehensive catalogue features three main types of results: AI Tools for practical use, Scientific Publications sharing research findings, and Public Deliverables.

What You’ll Find in the Catalogue

AI Tools

The “AI Tools” section of the catalogue offers various AI models, datasets and integrations developed by VOXReality. These tools are designed to make voice interaction in XR spaces more natural and responsive to human speech. They cover several key areas:

Understanding Language: Tools like T5 NLU, intent_recognition, Dialogue System, and navqa help AI understand and respond to human language and intentions within XR environments.
Multiple Languages: Tools such as Multilingual Translation and whisper-small-el-finetune help remove language barriers in XR, making it accessible worldwide.
Combining Vision and Language: Tools like Vision and Language Models, video-language_cap, and rgb_language_vqa demonstrate the project’s work in developing AI that understands both visual and spoken information simultaneously in XR. This is crucial for AI to interpret context from what a user says and what they see in the virtual or augmented world, leading to more natural interactions.
Core Integrations and Performance: Tools such as VOXReality Integration and Model Training and Inference Optimization provide frameworks and optimized solutions for smooth XR development.

Scientific Publications

The “Scientific Publications” section lists academic papers and research findings from the project’s partners. These publications detail the methods, analyses, and progress made in improving XR use cases with machine learning and combining language and vision AI models for immersive XR experiences. Examples of publication topics include:

XR Improvement & Machine Learning: Papers like “User centric Requirements for Enhancing XR Use Cases with Machine Learning Capabilities” focus on integrating machine learning into practical XR applications and understanding user needs.
Context-Aware Machine Translation: Research such as “Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models” addresses language challenges important for smooth voice interaction, helping AI understand language context and manage conversations in XR environments.
Multimodal AI & Dialogue Systems: Publications like “Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues” and project overviews “VOXReality: Voice driven interaction in XR spaces” show the project’s broad approach to AI for XR.

Public Deliverables

The “Public Deliverables” section provides access to key documents that outline the project’s progress and specific outcomes. These official reports offer detailed insights into various aspects of the VOXReality project’s development and findings, ensuring transparency.

Driving Progress in XR

The VOXReality Project Results Catalogue serves as a central hub for the project’s outputs, clearly demonstrating its progress and contributions to voice interaction in XR. It is designed for researchers, developers, and the general public to explore the AI tools, scientific knowledge, and public deliverables.

This open access helps speed up innovation by allowing others to build on VOXReality’s work, reducing duplicated efforts and accelerating XR development. This fosters a collaborative environment where knowledge and resources are shared, which is important for quick technological advancement and broad societal impact.

The VOXReality project invites everyone interested to visit the new Project Results Catalogue at https://voxreality.eu/project-results/. Explore the available resources, use the AI tools for your projects, and read the scientific publications and public deliverables for more details on the future of voice-driven XR.

VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (Instagram Post)

A Recap of the 6th VOXReality General Assembly

8 May 2025

by Ana R Blog post

From April 8th to 10th, the VOXReality General Assembly was held at the Maggioli headquarters in Santarcangelo di Romagna, Italy. It was three intense and inspiring days, during which partners from all over Europe gathered to share insights and internally test the project’s use cases: AR Training, AR Theatre, and Virtual Conference.

The first day was dedicated to hands-on experimentation. Technical teams set up their stations and carried out user testing sessions.

In the AR Theatre case, participants wore headsets that enriched the live theatrical performance with augmented reality elements, from real-time subtitle translations to interactive animations moving alongside the actors, allowing for customizable experiences tailored to audience preferences.

In the AR Training scenario, users assembled a virtual engine by manipulating fully augmented tools and components such as bolts, drills, and pliers. They tested the process both with and without the support of the VOXy voice assistant to assess which experience offered better usability.

Lastly, the VR Conference use case enabled testing of a hybrid format: some participants joined via laptop while others connected through VR headsets in separate rooms. This session explored interactions with the virtual assistant VOXy and real-time multilingual communication in virtual environments.

All demo participants completed targeted questionnaires to collect qualitative and quantitative data, which will help refine and optimize the project’s solutions.

The second day began with a project planning session, focusing on KPIs, risk management, and updates on multimodal AI for XR and model deployment. Later, technical partners presented preliminary data gathered during the first day of testing, discussed outcomes from each use case, and outlined the strategy for the second pilot phase.

The third and final day focused on sharing lessons learned, best practices, and success stories. Participants split into three groups, one per use case, and took part in a workshop aimed at identifying key strengths, challenges overcome, and strategies that worked well. Beyond pinpointing actionable insights, the workshop strengthened collaboration among partners and sparked new ideas for continuous improvement.

This General Assembly left us energized and enthusiastic, the progress achieved confirms that we’re on the right path. We look forward to the next steps and to driving VOXReality toward new horizons!

Greta Ioli

Greta Ioli is an EU Project Manager in the R&D department of Maggioli Group, one of Italy's leading companies providing software and digital services for Public Administrations. After earning a degree in International Relations – European Affairs from the University of Bologna, she specialized in European projects. Greta is mainly involved in drafting project proposals and managing dissemination, communication, and exploitation activities.

Copy of Maastricht General Assembly (Instagram Post)

XR EXPO 2025: VOXReality Open Call Projects AIXTRA and VAARHeT to Showcase XR Innovations! 🤩

5 May 2025

by Ana R Blog post

Two exciting VOXReality Open Call projects, AIXTRA and VAARHeT, are set to unveil their latest advancements in XR and AI at the XR EXPO 2025.

📍 Location: Stuttgart, Germany 🇩🇪

🗓️ Dates: May 8-9, 2025

🏢 Venues: Porsche Arena (conference) and Hanns-Martin-Schleyer-Halle (exhibition)

AIXTRA: Breaking Language Barriers in VR! 🗣️

📍 Booth: A17 (next to stage 2, Hanns-Martin-Schleyer-Halle)

AIXTRA delivers powerful tools for improved training and seamless communication in multi-user environments. Experience two immersive demos showcasing novel AI applications:

See how intent recognition can boost training effectiveness.
Try AI-based translation and synthesization services. Hear yourself speak a foreign language!

Find the AIXTRA team members there:

VAARHeT: Transforming Heritage Tours with AR and AI! 🏛️

📍 Booth: B41 (Hanns-Martin-Schleyer-Halle Innenraum)

The VAARHeT project is developing an AI-powered AR application to create a transformative, immersive visitor experience at the Āraiši ezerpils Archeological Parks.

Visit XR Ireland to explore AI-augmented applications of XR technologies using advanced voice-activated interaction:

Learn how to drive growth, reduce risk, and engage stakeholders in enterprise, cultural heritage, and international disaster response.
Try out work-in-progress pilot projects funded by the European Commission.

Find the VAARHeT team members there:

Don’t miss out! Join XR EXPO 2025 in Stuttgart to explore the future of AI-powered XR with AIXTRA and VAARHeT! ✨

Honorable mention in IEEEVR2025 Workshop (VR-HSA) Paper by CWI

23 April 2025

by Ana R Blog post

Award: Honorable mention of paper “User-Centric Requirements for Enhancing XR Use Cases with Machine Learning Capabilities” in the “Best Presentation” award at the International Workshop on Virtual Reality for Human and Spatial Augmentation (VR-HSA) in conjunction with IEEE VR 2025

We are glad to share that our team from CWI (Centrum Wiskunde & Informatica) participated and presented their work in the International Workshop on Virtual Reality for Human and Spatial Augmentation (VR-HSA), held in conjunction with IEEE VR 2025 in the beautiful coastal city of Saint-Malo, France, on March 9, 2025. At the workshop, we presented our paper, “User-Centric Requirements for Enhancing XR Use Cases with Machine Learning Capabilities,” authored by Sueyoon Lee, Moonisa Ahsan, Irene Viola, and Pablo Cesar. Additionally. This paper is based on two use cases (a) Virtual Conference, which mimics a real-life like conference in a VR environment (VRDays Foundation) and (b) Augmented Theatre, which showcases a Greek play in AR environment (Athens Festival). The paper shows our user-centric approach for conducting two focus groups to gather user requirements for these two use cases and to find where ML technologies could be implemented using VOXReality technology modules. We also showcased the overview of the full data collection, processing and evaluation pipeline with a poster presentation in a parallel session. We are happy to share that our presentation received the honorable mention in the Best Presentation Award category.

We are excited to see our work contributing to the growing field of ML-enhanced XR user experiences. We extend our thanks to all contributors from the use case owners (VRDays Foundation, Athens Festival AEF) and everyone who was part of the process and supported in the contribution for enabling this work; to the VR-HSA organizers and the broader XR community for supporting discussions. This recognition motivates us to continue working towards more user centric immersive experiences.

Abstract: The combination of Extended Reality (XR) and Machine Learning (ML) will enable a new set of applications. This requires adopting a user-centric approach to address the evolving user needs. This paper addresses this gap by presenting findings from two independent focus groups specifically designed to gather user requirements for two use cases: (1) a VR Conference with an AI-enabled support agent and real-time translations, and (2) an AR Theatre featuring ML generated translation capabilities and voice-activated VFX. Both focus groups were designed using context-mapping principles. We engaged 6 experts in each of the focus groups. Participants took part in a combination of independent and group activities aimed at mapping their interaction timelines, identifying positive experiences, and highlighting pain points for each scenario. These activities were followed by open discussions in semi-structured interviews to share their experiences. The inputs were analysed using Thematic Analysis and resulted in a set of user-centric requirements for both applications on Virtual Conference and Augmented Theatre respectively. Subtitles and Translations were the most interesting and common findings in both cases. The results led to the design and development of both applications. By documenting user-centric requirements, these results contribute significantly to the evolving landscape of immersive technologies.

Keywords: Virtual Reality, VR conference, Augmented Reality, AR theatre, Focus groups, User requirements, Use cases, Human-centric design.

Reference

Lee, M. Ahsan, I. Viola, and P. Cesar, “User-centric requirements for enhancing XR use cases with machine learning capabilities,” in Proceedings of VR-HSA Workshop (IEEEVR2025), March 2025.

Moonisa Ahsan

Moonisa Ahsan is a post-doc in the DIS (Distributed & Interactive Systems) Group of CWI. She was also the external-supervisor for the aforementioned thesis work. In VOX, she is contributing in understanding next-generation applications within Extended Reality (XR), and to better understand user needs and leveraging that knowledge to develop innovative solutions that enhance the user experience in all three use-cases. She is a Marie-Curie Alumna and her scientific and research interests are Human-Computer Interaction (HCI), User Centric Design (UCD), Extended Reality (XR) and Cultural Heritage (CH).

Master’s Thesis titled “Enhancing the Spectator Experience by Integrating Subtitle Display in eXtended Reality Theatres” defended last December in Amsterdam

16 April 2025

by Ana R Blog post

Master’s Student: Atanas Yonkov
Thesis Advisors (CWI): Moonisa Ahsan, Irene Viola and Pablo Cesar

Abstract: The rapid growth of virtual and augmented reality technologies, encapsulated by the term eXtended Reality (XR), has revolutionized the interaction with digital content, bringing new opportunities for entertainment and communication. Subtitles and closed captions are crucial in improving language learning, vocabulary acquisition, and accessibility, like understanding audiovisual content. However, little is known about integrating subtitle displays in extended reality theatre environments and their experience influence on the user. This study addresses this gap by examining subtitle placement and design attributes specific to XR settings. Building on previous research on subtitle placement, mainly in television and 360-degree videos, this project focuses on the differences between static and dynamic subtitle variants. The study uses a comprehensive literature review, Virtual Reality (VR) theatre experiment, and analytics to investigate these aspects of subtitle integration in the specific case of a VR theatrical Greek play with subtitles. The results show that the comparison between the two variants is insignificant, and both implementations produce high scores. However, thematic analysis suggests the preference for static over the dynamic variant depends heavily on the specific context and the number of speakers in the scene. Since this study focuses on a monologue theatrical play, the next step in future work would be to explore a “multi-speaker” play.

The partners from the DIS (Distributed and Interactive System) group of Centrum Wiskunde en Informatica (CWI) hosted and supervised a Master’s thesis^[1] titled as “Enhancing the Spectator Experience by Integrating Subtitle Display in eXtended Reality Theatres” by Atanas Yonkov at University of Amsterdam (UvA). The advisors from CWI were Moonisa Ahsan, Irene Viola and Pablo Cesar, and the university advisors were Prof. dr. Frank Nack and Prof. dr. Hamed Seiied Alavi. The thesis focuses on XR Theatres, investigating subtitle integration in virtual reality (VR) theatre environments designed within the VOXReality project. The user study in the thesis was based on an extended VR version of the AR Theatre Use Case Application of VOXReality project, showcasing the Greek theatrical play Euripides by Hippolytus. The goal was to bridge the existing research gap by exploring optimal subtitle positioning in VR theatre, focusing on two key approaches: static and dynamic subtitles. In the study, the Static subtitles (see fig 1a) are fixed relative to the user’s gaze, ensuring they remain within the viewer’s field of vision regardless of scene movement. The Dynamic subtitles” (see fig 1b) are anchored to objects—in this case, actors—moving naturally with them within the virtual environment.

The study was conducted from May 13, 2024, to May 22, 2024, at the DIS Immersive Media Lab, Centrum Wiskunde en Informatica (CWI) in Amsterdam, The Netherlands. The study examined how subtitle placement affects the user experience in a VR theatrical adaptation of a Greek play. Results indicated no significant difference in user experience between static and dynamic subtitle implementations, with both approaches receiving high usability scores. However, a thematic analysis revealed that user preference for static or dynamic subtitles was highly context-dependent. In particular, the number of speakers in a scene influenced subtitle readability and ease of comprehension: a) in monologue settings, static subtitles were often preferred for their stability and ease of reading; b) in potential future scenarios with multiple speakers, dynamic subtitles could enhance spatial awareness and dialogue attribution. Each session lasted approximately 60 minutes, with individual durations varying between 50 minutes and 120 minutes, depending on participant familiarity and adaptability with VR headsets and controllers. Our findings, which will be detailed in future blog posts, contribute to the growing body of research on subtitle placement in immersive environments. This work builds upon previous studies in subtitle integration for television and 360-degree videos, extending the analysis to VR theatre settings. And this study also informs several design and user experience decisions for the AR Theatre use case within the project. For the future work, given that this study focused on a monologue performance, further research should extend the analysis to multi-speaker theatrical plays to further explore subtitle effectiveness in complex dialogue scenarios.

^[1] Atanas Yonkov, “Enhancing the Spectator Experience: Integrating Subtitle Display in eXtended Reality Theatres (Master’s thesis). Universiteit van Amsterdam, 2024. Available at https://scripties.uba.uva.nl/search?id=record_55113

Moonisa Ahsan

Luxembourg’s Immersive Days 2025

3 April 2025

by Ana R Blog post

The Immersive Days 2025, held on March 4 and 5 in Luxembourg City, explored immersive technologies and their intersection with art, culture, and society. This two-day conference, organized by Film Fund Luxembourg in collaboration with the Luxembourg City Film Festival and PHI Montreal, brought together international experts, professionals, and artists active in the XR industry to discuss the latest developments and challenges in the field and underscore Luxembourg’s growing prominence in the immersive arts and virtual reality (VR) sectors.

This year’s conference delivered again a programme open to the general public and provided an opportunity to engage directly with the creators behind the immersive works featured in the Immersive Pavilion 2025.

Lectures and round tables started on March 4, and they were hosted at the Cercle Cité, gathered for professionals and the general public, mainly featuring creators whose works were exhibited at this year’s Immersive Pavilion. Discussions were centred around their unique creative processes, reflecting on their fictional and personal stories, translated to immersive content, and where the challenges presented during the ideation, production and distribution process were discussed.

The second day, held at Neumünster, was reserved for industry professionals and delved into more technical and forward-looking topics within the XR industry. This day fostered international exchanges and promoted peer networking among professionals. Discussions during day 2 covered the current challenges in the preservation of digital content, funding opportunities for XR projects showcasing the German Regional funding system, and last but most important, the impact of AI technology on immersive experiences was discussed in a panel titled “AI/XR: The Future of AI for Immersive and Virtual Arts”.

Among the guests of this year’s programme, we found Stéphane Hueber-Blies and Nicolas Blies, directors of “Ceci est mon cœur“, François Vautier director of “Champ de Bataille” awarded “Best Immersive Experience” at this year’s Immersive Pavilion, and Octavian Mot, director of “AI & Me: The Confessional and AI Ego“, a thought-provoking installation that explores the intricate relationship between humans and artificial intelligence.

Mot’s work invites participants into an intimate dialogue with an AI entity, challenging them to reflect on themes of identity, consciousness, and the evolving dynamics between human and machine. The installation was a highlight of the Immersive Pavilion 2025, showcasing the potential of AI to create deeply personal and immersive art experiences.

While “AI & Me” offers an introspective exploration of human-AI interaction of an artistic nature, it was impossible not to draw parallels to VOX Reality’s AI Agent, representing a more utilitarian and non-intrusive application of AI in immersive environments. VOX Reality’s AI Agent is designed to enhance user experiences in virtual spaces by providing responsive and adaptive interactions, serving roles ranging from virtual assistants to dynamic characters within virtual narratives, while “AI & Me” creates its narrative by “perceiving” users while completely unbound to the industry standard “rules of engagement” with humans, resulting in answers and interactions that can come across as cunning and raw, while leaving the user to deal with his sense of humour. Brilliant!

The juxtaposition of “AI & Me” and VOX Reality’s AI Agent underscores the diverse applications of AI agents in today’s technological landscape. On one hand, AI is leveraged as a medium for artistic expression, prompting users to engage in self-reflection and philosophical inquiry. On the other hand, AI serves practical functions, improving user engagement and functionality within virtual environments. This duality highlights the versatility of AI agents and their growing significance across various domains.

In conclusion, Immersive Days 2025 in Luxembourg City successfully bridged the gap between art and technology, providing a platform for meaningful discussions and showcasing pioneering works in the field of immersive experiences. The event not only highlighted the current state of immersive art and technology but also set the stage for future innovations, emphasising the importance of interdisciplinary collaboration and the country’s significant contribution to the international immersive production landscape, a success largely attributed to the strategic initiatives of the Film Fund Luxembourg.

Manuel Toledo - Head of Production at VRDays Foundation

Manuel Toledo is a driven producer and designer with over a decade of experience in the arts and creative industries. Through various collaborative projects, he merges his creative interests with business research experience and entrepreneurial skills. His multidisciplinary approach and passion for intercultural interaction have allowed him to work effectively with diverse teams and clients across cultural, corporate, and academic sectors.

Starting in 2015, Manuel co-founded and produced the UK’s first architecture and film festival in London. Since early 2022, he has led the production team for Immersive Tech Week at VRDays Foundation in Rotterdam and serves as the primary producer for the XR Programme at De Doelen in Rotterdam. He is also a founding member of ArqFilmfest, Latin America’s first architecture and film festival, which debuted in Santiago de Chile in 2011. In 2020, Manuel earned a Master’s degree from Rotterdam Business School, with a thesis focused on innovative business models for media enterprises. He leads the VRDays Foundation’s team’s contributions to the VOXReality project.

Copy of OC EIC Solvers Webinar #2 (1920 x 1080 px) (1920 x 1200 px) (LinkedIn Post) (1)

Developing NLP models in the age of AI race

27 March 2025

by Ana R Blog post

The AI rance intensifies

During the last 10-15 years, Natural Language Processing (NLP) has undergone a profound transformation, driven by advancements in deep learning, use of massive datasets and increased computational power. These innovations led to early breakthroughs such as word embeddings (Word2Vec [1], GloVe [2]) and paved the way for advanced architectures like sequence-to-sequence models and attention mechanisms, all based on neural architectures. It was in 2018, that the introduction of transformers and especially BERT [3] (as an open-source model) that enabled the contextualized understanding of language. Performance in NLP tasks like machine translation, sentiment analysis or speech recognition has been significantly boosted, making AI-driven language technologies more accurate and scalable than ever before.

The “AI race” has intensified with the rise of large language models (LLMs) like OpenAI’s ChatGPT [4] and DeepSeek-R1 [5], which use huge architectures with billions of parameters and massive multilingual datasets to push the boundaries of NLP. These models dominate fields like conversational AI and can perform a wide range of tasks by achieving human-like fluency and context awareness. Companies and research institutions worldwide are competing to build more powerful, efficient, and aligned AI systems, leading to a rapid cycle of innovation. However, this race also raises challenges related to interpretability, ethical AI deployment and the accessibility of high-performing models beyond large tech firms.

But what did DeepSeek achieve? In early 2025, DeepSeek released its R1 model, which has been noted to outperform many state-of-the-art LLMs at a lower cost, therefore it caused a disruption in the AI sector. DeepSeek had made its R1 model available on platforms like Azure, allowing users to take advantage of their technology. DeepSeek introduced many technical innovations that allowed their model to thrive (such as architecture innovations: hybrid transformer design, the use of mixture-of-experts models, auxiliary-loss-free load balancing) however their main contribution was the reduction of reliance on traditional labeled datasets. This innovation stems from the integration of pure reinforcement learning techniques (RL), enabling the model to learn complex reasoning tasks without the need for extensive labeled data. This approach not only reduces the dependency on large labeled datasets but also streamlines the training process, lowering the resource requirements and costs associated with developing advanced AI models.

The Enduring Relevance of models used in VOXreality

At VOXReality we take a fundamentally different approach and believe in the significant value brought by “traditional” AI models (esp. for ASR and MT) particularly in specialized domain applications. We prioritize real open-source AI by ensuring transparency, reproducibility and accessibility [6]. Unlike proprietary or restricted “open weight” models, our work is built upon truly open architectures that allow full modification and deployment without any limitations. This is the reason that our open call winners [7] are allowed to build on top of the VOXreality ecosystem. Moreover, our approaches often require less computational power and data, making them suitable for scenarios with limited resources or where deploying large-scale AI models is impractical. Our models can be tailored to specific industries or fields, incorporating domain specific expertise without extensive or expensive retraining. The implementation of models on a local scale (if chosen), can also offer enhanced control over data and compliance with privacy regulations, which can be a significant consideration in sensitive domains.

VoxReality’s Strategic Integration

At VoxReality, we strategically integrate traditional ASR and MT approaches to complement advanced AI models, ensuring a comprehensive and adaptable solution that leverages the strengths of state-of-the art AI models. This focus on real open-source innovation and data-driven performance differentiates VOXreality from the rapidly evolving landance of AI mega-models.

Jerry Spanakis

Assistant Professor in Data Mining & Machine Learning at Maastricht University

References

[1] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.

[2] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics

[4] https://openai.com/chatgpt/overview/

[5] https://github.com/deepseek-ai/DeepSeek-R1

[6] https://huggingface.co/voxreality

[7] https://voxreality.eu/open-call-winners/

VOXReality’s Open Learning Revolution

21 March 2025

by Ana R Blog post

Industrial training has traditionally followed rigid, stepwise instruction, ensuring compliance and accuracy but often at the cost of creativity and adaptability. However, with the rapid advancements in Extended Reality (XR) and Artificial Intelligence (AI), training methodologies are shifting toward more dynamic and flexible models.

At the heart of this transformation is VOXReality, an XR-powered training system that departs from traditional step-by-step assembly guides. Instead, it embraces a freemode open-learning approach, allowing workers to logically define and customize their own assembly sequences with complete freedom. This method enhances problem-solving skills, engagement, and real-world adaptability.

Unlike conventional training, which dictates a specific order of operations, VOXReality’s open-ended model empowers users to experiment, explore, and determine their optimal workflow. This approach offers several key benefits: workers are more engaged when they can approach tasks in a way that feels natural to them; trainees develop a deeper understanding of assembly processes through problem-solving rather than rote memorization; the system adapts to different skill levels, allowing experienced workers to optimize workflows while providing guidance to beginners; and, since real-world assembly is rarely linear, this method better prepares workers for unexpected challenges on the factory floor.

VOXReality integrates an AI-driven dialogue agent to ensure trainees never feel lost in this open-ended system. This virtual assistant provides real-time feedback, allowing users to receive instant insights into their choices and refine their approach. It also enhances engagement and interactive learning by enabling workers to ask questions and receive contextual guidance rather than following static instructions. Additionally, the AI helps prevent errors by highlighting potential missteps, ensuring that creativity does not come at the cost of safety or quality.

Development Progress:

Below, we outline the development status with some corresponding screenshots that showcase the system’s core functionalities and user interactions.

The interface features two text panels displaying the conversation between the user and the dialogue agent. When the user speaks, an automated speech recognition tool (created by our partners from Maastricht University) converts their speech into text and sends it to the dialogue agent (created by our partners in Synelixis), which is shown in the top panel (input panel). The dialogue agent then processes the input, provides contextual responses, and uses a text-to-speech tool to read them aloud. These responses are displayed in the lower panel (output panel). Additionally, the system can trigger audio and video cues based on user requests. The entire scene is color-coded to enhance user feedback and improve interaction clarity.

The screenshots below capture the dialogue between a naive user and the dialogue agent. The user enters the scene and asks for help. The Dialogue Agent is guiding the user for the next steps.

The screenshot below captures the user’s curious question regarding the model to be assembled. The Dialogue Agent provides contextual answers to the user.

The user asks the Dialogue Agent to show a video about one of the steps. The Dialogue Agent triggers the function in the application to show the corresponding video on the output panel.

The user grabs an object and asks the Dialogue Agent to give a hint about the step he wants perform. The Dialogue Agent triggers the function in the application to give a useful hint.

The implementation of freemode XR training is just the beginning. As AI and XR technologies continue to evolve, the potential for fully immersive, adaptive, and intelligent industrial training systems grows exponentially. The success of this approach will be measured by increased worker efficiency, reduced onboarding time, and higher retention of complex technical skills.

VOXReality’s commitment to redefining industrial learning aligns with the broader movement toward smart manufacturing and Industry 5.0. By blending technology with human intuition and adaptability, we are not just training workers—we are empowering the future of industry. We are looking forward to test the solution with unbiased users and receive feedback for improvements.

Leesa Joyce

Head of Research @ Hololight

&

Gabriele Princiotta

Unity XR Developer @ Hololight

VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (Instagram Post) (1)

Celebrating Women in Extended Reality: Insights and Inspiration from the Women in XR Webinar

19 March 2025

by Ana R Blog post

VOXReality, in collaboration with SERMAS, XR4ED, TRANSMIXR, HECOF, MASTER, and CORTEX2 projects, had the privilege of hosting the “Women in XR Webinar – Celebrating Women in Extended Reality.” This online event brought together leading female experts from EU-funded XR projects for an inspiring discussion on the role of women in the rapidly evolving field of Extended Reality. We were honored to have a panel featuring Regina Van Tongeren, Grace Dinan, Leesa Joyce, Moonisa Ahsan, Megha Quamara, Georgia Papaioannou, Maria Madarieta, and Marievi Xezonaki. From seasoned trailblazers with 20 years of experience to emerging voices, these panelists shared their journeys, challenges, and invaluable insights. This webinar aimed to highlight the importance of gender diversity in XR and provide practical advice for aspiring women in tech.

Navigating the Digital Divide: Realities of Women in XR

The panelists openly discussed the challenges women face in the XR industry. While the field offers immense creative potential, it is inherently challenging. Participants highlighted several key issues. Women often find fewer opportunities compared to their male counterparts. The persistent pay gap remains a significant barrier. Women’s contributions can be overlooked, hindering career advancement. Some women still experience difficulties in accessing advocacy and support from the broader XR community. These challenges underscore the need for systemic changes to ensure equal opportunities and recognition for women in XR.

Unlocking Immersive Potential: The Boundless Opportunities for Women in XR

Despite the challenges, the webinar emphasized the vast opportunities available for women in XR. The panelists pointed to the expanding applications of XR across various sectors. XR has the potential to modernize medical training, patient care, and therapy in healthcare. Immersive learning experiences enhance engagement and knowledge retention in education. Innovative applications for virtual try-ons, digital fashion, and immersive design processes are emerging in fashion and design. The XR field is not limited to technical roles; it requires a wide range of skills, including legal expertise, artistic talent, and scientific knowledge. These emerging opportunities present a unique chance for women to lead and shape the future of XR.

Empowering the Future: Actionable Insights and Key Takeaways for Women in XR

The panelists shared a wealth of practical advice for women looking to thrive in the XR industry. They emphasized the importance of building a strong network and finding a supportive community within XR. Organizations like Women in Immersive Tech Europe [1] provide valuable resources, mentorship, and networking opportunities. Seeking out inspiring role models, such as Parul Wadhwa [2], or event figures like Marie Curie, and learning from their experiences was also strongly encouraged.

Furthermore, the panelists stressed the importance of being assertive and comfortable making suggestions. Staying updated with the latest developments in the rapidly evolving XR field is crucial, as is a commitment to continuous learning. They advised against trying to conform to a pre-existing mold, urging women to bring their unique perspectives to the table and contribute to creating inclusive XR experiences. Building a strong online brand, including a professional portfolio, active social media channels, and a personal website, was highlighted as essential for visibility.

For XR teams, the message was clear: diversity must be a core value, integrated into the DNA of the team and its products, not an afterthought. Diversifying hiring teams to include a wide range of skill sets is essential. For those considering starting their own businesses or working freelance, platforms like Immersive Insiders [3] and TalentLabXR [4] were recommended, along with exploring relevant courses from institutions such as the University of London and the University of Michigan.

The webinar left us with a powerful call to action, inspiring us to work together towards a more inclusive and equitable XR future. We encourage you to follow all the panelists, especially the member from our team, Leesa Joyce and Moonisa Ahsan, and be inspired by their ongoing leadership!

Missed the Live Session? Catch the Recording! If you were unable to join us live, don’t worry! The full event recording is available on the F6S Innovation YouTube channel.

Links

[1] http://wiiteurope.org/

[2] http://www.parulwadhwa.com/

[3] https://immersive-insiders.com/

[4] https://www.b3media.net/talentlab-xr

Ana Rita Alves

Communication Manager at F6S, where she specializes in managing communication and dissemination strategies for EU-funded projects. She holds an Integrated Master’s Degree in Community and Organizational Psychology from the University of Minho, which has provided her with strong skills in communication, project management, and stakeholder engagement. Her professional background includes experience in proposal writing, event management, and digital content creation.

Photo by James Bellorini, https://www.citymatters.london/london-short-film-festival-smart-caption-glasses/

Choosing the right AR solution for Theatrical Performances

19 March 2025

by Ana R Blog post

Choosing suitable AR devices for the VOXReality AR Theatre has been a challenging endeavour. The selection criteria are based on the user and system requirements of the VOXReality use case, which have been extracted through a rigorous user-centric design process and iterative system architecture design. The four critical selection criteria dictate that:

The AR device should have a comfortable and discreet glass-like form factor for improved user experience and technological acceptance in theatres.
The AR device should support affordability, durability and long-term maintenance for feasible implementation at audience-wide scale.
The AR device should support personalization, so that each audience member can customize the fit to their needs, and allow strict sanitization protocols, so that device distribution can adhere to high level public health standards.
The AR device should support application development with open standards instead of proprietary SDKs for widespread adoption of the solution.

Given the above criteria, the selection process presents a clear challenge because no readily available AR solution offers a perfect fit. To address this need, the VOXReality team performed an extensive investigation of the range of available options with a view to the past, present, and future, and is presenting the results below.

The past

A quick look to the past shows that popular AR options were clearly unsuitable given the selection criteria. Specifically, affordable AR has a long, proven track as affordable and user-friendly camera-based AR, deployed on consumer smartphones and distributed through appropriate platforms as standalone applications. The restrictions on the user experience though, such as holding a phone up while seated to watch the play through a small screen, make this a clearly prohibiting option. Previous innovative designs of highly sophisticated -and costly- augmented reality devices, sometimes also referred to as holographic devices or mixed reality devices, do not support either scalability, or discreet presence required by the use case, and are also similarly rejected. Finally, a range of industry-oriented AR designs available as early as 2011 focused on monocular, non-invasive AR displays with limited resolution and display capabilities, at a prohibiting cost due to (among others) extensive durability and safety certifications. Therefore, with a view to the past, one can posit that the AR theatre use case had not taken up popularity due to pragmatic constraints.

The present

In recent years and as early as 2020, hardware developments picked up pace. Apart from evolutions of previously available designs and/or newcomers in similar design concepts, more diverse design concepts have been introduced in a persistent trend of lowering procurement costs and offering more support for open-source frameworks. Nowadays, this trend has culminated in a wide range of “smart glasses”, i.e. wearables with glass-like form factor supporting some level of audiovisual augmentation, which rely on external devices for computations (such as smartphones).

This design concept finds its origins in the previously mentioned industry-oriented AR designs, as well as their business-oriented counterparts. This time though, the AR glass concept is entering the consumer space with options that are durable for daily, street wear and tear while also remaining affordable for personal use. Some designs are provided directly bundled with proprietary software in a closed system approach (like AI-driven features or social media-oriented integrations), but the majority offers user-friendly, plug-n-play, tethered or wireless capabilities, directly supporting most personal smartphones or even laptops.

The VOXReality design

This landscape enables new, alternative system designs for AR Theatre use cases: instead of theatre-owned hardware/software solutions, one can envision system with a combinations of consumer and theatre-owned hardware/software elements. By investigating how various stakeholders respond to this potential, we can pinpoint best practices and future recommendations.

Examining implemented AR theatre use cases, one can validate that the past landscape is dominated by a design approach with theatre-owned integrated (hardware/software) solutions. Excellent examples where the theatre provides hardware and software to the audience, are the National Theatre’s smart caption glasses feature [1], developed by Accenture and the National Theatre, as well as the Greek-based SmartSubs [2] project.

One new alternative that presents itself is for audience to use their own custom hardware/software solutions dedicated to live subtitling and translations during performances. In this case, each user can choose their own AR device, pre-bundled with general-purpose translation software of their preference.

As eloquently described in a recent article though [3], general purpose AI-captioning and translations frequently make mistakes and fail to capture nuances, which especially in artistic performances can break immersion and negatively impact audience experience. Therefore, in VOXReality we design for a transition from the past to the future: developing custom software dedicated to theatrical needs, optimized at generating real-time subtitles and translations of literary text, which can also be easily deployed on theatre-owned AR devices and/or on consumer-owned devices with minimal adaptations. This is enabled by a rigorous user-centric design approach which can verify the features and requirements per deployment option, as well as contemporary technical development practices using open standards such as OpenXR.

The future

The future looks bright with community driven initiatives showing how accessible AR and AI technology can be, as in the example of open-source smart glasses you can build on your own [4], and in the continuous improvements on automatic speech recognition and neural machine translation allowing models to run performantly on ever less resources. VOXReality aims to leave a long-standing contribution to the domain of AR theatre with the objective of establishing reliable, immersive and performant technological solutions as the mainstream in making cultural heritage content accessible to all.

Spyros Polychronopoulos

Research Manager at ADAPTIT and Assistant Professor at the Department of Music Technology and Acoustics of the HMU

&

Olga Chatzifoti

Extended Reality applications developer working with Gruppo Maggioli for the design and development of the Augmented Reality use case of the VOXReality HORIZON research project. She is also a researcher in the Department of Informatics and Telecommunications of the University of Athens.

Links

[1] https://www.nationaltheatre.org.uk/your-visit/access/

[2] https://smartsubs.ime.gr/results.html

[3] https://www.theverge.com/2025/1/24/24351013/ray-ban-meta-smart-glasses-translation-wearables

[4] https://github.com/AugmentOS-Community/OpenSourceSmartGlasses

Photo by James Bellorini, retrieved from https://www.citymatters.london/london-short-film-festival-smart-caption-glasses/

VOXReality

Posts in category: Blog post

VOXReality Launches Project Results Catalogue

A Recap of the 6th VOXReality General Assembly

Greta Ioli

XR EXPO 2025: VOXReality Open Call Projects AIXTRA and VAARHeT to Showcase XR Innovations! 🤩

Honorable mention in IEEEVR2025 Workshop (VR-HSA) Paper by CWI

Moonisa Ahsan

Master’s Thesis titled “Enhancing the Spectator Experience by Integrating Subtitle Display in eXtended Reality Theatres” defended last December in Amsterdam

Moonisa Ahsan

Luxembourg’s Immersive Days 2025

Manuel Toledo - Head of Production at VRDays Foundation

Developing NLP models in the age of AI race

Jerry Spanakis

VOXReality’s Open Learning Revolution

Leesa Joyce

&

Gabriele Princiotta

Celebrating Women in Extended Reality: Insights and Inspiration from the Women in XR Webinar

Ana Rita Alves

Choosing the right AR solution for Theatrical Performances

Spyros Polychronopoulos

&

Olga Chatzifoti

POLICIES

EMAIL

SOCIAL MEDIA