images-IEEEVR2025-CWI-3-photo

Honorable mention in IEEEVR2025 Workshop (VR-HSA) Paper by CWI

Award: Honorable mention of paper “User-Centric Requirements for Enhancing XR Use Cases with Machine Learning Capabilities” in the “Best Presentation” award at the International Workshop on Virtual Reality for Human and Spatial Augmentation (VR-HSA) in conjunction with IEEE VR 2025 

We are glad to share that our team from CWI (Centrum Wiskunde & Informatica) participated and presented their work in the International Workshop on Virtual Reality for Human and Spatial Augmentation (VR-HSA), held in conjunction with IEEE VR 2025 in the beautiful coastal city of Saint-Malo, France, on March 9, 2025. At the workshop, we presented our paper, “User-Centric Requirements for Enhancing XR Use Cases with Machine Learning Capabilities,” authored by Sueyoon Lee, Moonisa Ahsan, Irene Viola, and Pablo Cesar. Additionally. This paper is based on two use cases (a) Virtual Conference, which mimics a real-life like conference in a VR environment (VRDays Foundation) and (b) Augmented Theatre, which showcases a Greek play in AR environment (Athens Festival). The paper shows our user-centric approach for conducting two focus groups to gather user requirements for these two use cases and to find where ML technologies could be implemented using VOXReality technology modules. We also showcased the overview of the full data collection, processing and evaluation pipeline with a poster presentation in a parallel session. We are happy to share that our presentation received the honorable mention in the Best Presentation Award category.  

Poster for the paper presented in the IEEEVR2025 Workshop (VR-HSA)
Poster for the paper presented in the IEEEVR2025 Workshop (VR-HSA)

We are excited to see our work contributing to the growing field of ML-enhanced XR user experiences. We extend our thanks to all contributors from the use case owners (VRDays Foundation, Athens Festival AEF) and everyone who was part of the process and supported in the contribution for enabling this work; to the VR-HSA organizers and the broader XR community for supporting discussions. This recognition motivates us to continue working towards more user centric immersive experiences. 

Photo Credits: Moonisa Ahsan at the VR-HSA Workshop at IEEE VR 2025

Abstract: The combination of Extended Reality (XR) and Machine Learning (ML) will enable a new set of applications. This requires adopting a user-centric approach to address the evolving user needs. This paper addresses this gap by presenting findings from two independent focus groups specifically designed to gather user requirements for two use cases: (1) a VR Conference with an AI-enabled support agent and real-time translations, and (2) an AR Theatre featuring ML generated translation capabilities and voice-activated VFX. Both focus groups were designed using context-mapping principles. We engaged 6 experts in each of the focus groups. Participants took part in a combination of independent and group activities aimed at mapping their interaction timelines, identifying positive experiences, and highlighting pain points for each scenario. These activities were followed by open discussions in semi-structured interviews to share their experiences. The inputs were analysed using Thematic Analysis and resulted in a set of user-centric requirements for both applications on Virtual Conference and Augmented Theatre respectively. Subtitles and Translations were the most interesting and common findings in both cases. The results led to the design and development of both applications. By documenting user-centric requirements, these results contribute significantly to the evolving landscape of immersive technologies.  

Keywords: Virtual Reality, VR conference, Augmented Reality, AR theatre, Focus groups, User requirements, Use cases, Human-centric design. 

Reference 

  1. Lee, M. Ahsan, I. Viola, and P. Cesar, “User-centric requirements for enhancing XR use cases with machine learning capabilities,” in Proceedings of VR-HSA Workshop (IEEEVR2025), March 2025.
Picture of Moonisa Ahsan

Moonisa Ahsan

Moonisa Ahsan is a post-doc in the DIS (Distributed & Interactive Systems) Group of CWI. She was also the external-supervisor for the aforementioned thesis work. In VOX, she is contributing in understanding next-generation applications within Extended Reality (XR), and to better understand user needs and leveraging that knowledge to develop innovative solutions that enhance the user experience in all three use-cases. She is a Marie-Curie Alumna and her scientific and research interests are Human-Computer Interaction (HCI), User Centric Design (UCD), Extended Reality (XR) and Cultural Heritage (CH).

Twitter
LinkedIn
blog-cwi-thesis-Images-Atanas2

Master’s Thesis titled “Enhancing the Spectator Experience by Integrating Subtitle Display in eXtended Reality Theatres” defended last December in Amsterdam

Master’s Student: Atanas Yonkov
Thesis Advisors (CWI): Moonisa Ahsan, Irene Viola and Pablo Cesar

Abstract: The rapid growth of virtual and augmented reality technologies, encapsulated by the term eXtended Reality (XR), has revolutionized the interaction with digital content, bringing new opportunities for entertainment and communication. Subtitles and closed captions are crucial in improving language learning, vocabulary acquisition, and accessibility, like understanding audiovisual content. However, little is known about integrating subtitle displays in extended reality theatre environments and their experience influence on the user. This study addresses this gap by examining subtitle placement and design attributes specific to XR settings. Building on previous research on subtitle placement, mainly in television and 360-degree videos, this project focuses on the differences between static and dynamic subtitle variants. The study uses a comprehensive literature review, Virtual Reality (VR) theatre experiment, and analytics to investigate these aspects of subtitle integration in the specific case of a VR theatrical Greek play with subtitles. The results show that the comparison between the two variants is insignificant, and both implementations produce high scores. However, thematic analysis suggests the preference for static over the dynamic variant depends heavily on the specific context and the number of speakers in the scene. Since this study focuses on a monologue theatrical play, the next step in future work would be to explore a “multi-speaker” play.

The partners from the DIS (Distributed and Interactive System) group of Centrum Wiskunde en Informatica (CWI) hosted and supervised a Master’s thesis[1] titled as “Enhancing the Spectator Experience by Integrating Subtitle Display in eXtended Reality Theatres” by Atanas Yonkov at University of Amsterdam (UvA). The advisors from CWI were Moonisa Ahsan, Irene Viola and Pablo Cesar, and the university advisors were Prof. dr. Frank Nack and Prof. dr. Hamed Seiied Alavi. The thesis focuses on XR Theatres, investigating subtitle integration in virtual reality (VR) theatre environments designed within the VOXReality project. The user study in the thesis was based on an extended VR version of the AR Theatre Use Case Application of VOXReality project, showcasing the Greek theatrical play Euripides by Hippolytus. The goal was to bridge the existing research gap by exploring optimal subtitle positioning in VR theatre, focusing on two key approaches: static and dynamic subtitles. In the study, the Static subtitles (see fig 1a) are fixed relative to the user’s gaze, ensuring they remain within the viewer’s field of vision regardless of scene movement. The Dynamic subtitles” (see fig 1b) are anchored to objects—in this case, actors—moving naturally with them within the virtual environment.

Figure 1 (a) Static and (b) Dynamic subtitles in a theatrical play scene from a participant’s VR headset perspective
Figure 1 (a) Static and (b) Dynamic subtitles in a theatrical play scene from a participant’s VR headset perspective

The study was conducted from May 13, 2024, to May 22, 2024, at the DIS Immersive Media Lab, Centrum Wiskunde en Informatica (CWI) in Amsterdam, The Netherlands. The study examined how subtitle placement affects the user experience in a VR theatrical adaptation of a Greek play. Results indicated no significant difference in user experience between static and dynamic subtitle implementations, with both approaches receiving high usability scores. However, a thematic analysis revealed that user preference for static or dynamic subtitles was highly context-dependent. In particular, the number of speakers in a scene influenced subtitle readability and ease of comprehension: a) in monologue settings, static subtitles were often preferred for their stability and ease of reading; b) in potential future scenarios with multiple speakers, dynamic subtitles could enhance spatial awareness and dialogue attribution. Each session lasted approximately 60 minutes, with individual durations varying between 50 minutes and 120 minutes, depending on participant familiarity and adaptability with VR headsets and controllers. Our findings, which will be detailed in future blog posts, contribute to the growing body of research on subtitle placement in immersive environments. This work builds upon previous studies in subtitle integration for television and 360-degree videos, extending the analysis to VR theatre settings. And this study also informs several design and user experience decisions for the AR Theatre use case within the project. For the future work, given that this study focused on a monologue performance, further research should extend the analysis to multi-speaker theatrical plays to further explore subtitle effectiveness in complex dialogue scenarios.

Image Courtesy of Atanas Yonkov: Master’s Graduation Ceremony at University of Amsterdam (UvA) (2024)

[1] Atanas Yonkov, “Enhancing the Spectator Experience: Integrating Subtitle Display in eXtended Reality Theatres (Master’s thesis). Universiteit van Amsterdam, 2024. Available at  https://scripties.uba.uva.nl/search?id=record_55113

Picture of Moonisa Ahsan

Moonisa Ahsan

Moonisa Ahsan is a post-doc in the DIS (Distributed & Interactive Systems) Group of CWI. She was also the external-supervisor for the aforementioned thesis work. In VOX, she is contributing in understanding next-generation applications within Extended Reality (XR), and to better understand user needs and leveraging that knowledge to develop innovative solutions that enhance the user experience in all three use-cases. She is a Marie-Curie Alumna and her scientific and research interests are Human-Computer Interaction (HCI), User Centric Design (UCD), Extended Reality (XR) and Cultural Heritage (CH).

Twitter
LinkedIn
mots_AI-and-Me_1.62.1-Octavian-Mot

Luxembourg’s Immersive Days 2025

The Immersive Days 2025, held on March 4 and 5 in Luxembourg City, explored immersive technologies and their intersection with art, culture, and society. This two-day conference, organized by Film Fund Luxembourg in collaboration with the Luxembourg City Film Festival and PHI Montreal, brought together international experts, professionals, and artists active in the XR industry to discuss the latest developments and challenges in the field and underscore Luxembourg’s growing prominence in the immersive arts and virtual reality (VR) sectors.

This year’s conference delivered again a programme open to the general public and provided an opportunity to engage directly with the creators behind the immersive works featured in the Immersive Pavilion 2025.

Lectures and round tables started on March 4, and they were hosted at the Cercle Cité, gathered for professionals and the general public, mainly featuring creators whose works were exhibited at this year’s Immersive Pavilion.  Discussions were centred around their unique creative processes, reflecting on their fictional and personal stories, translated to immersive content, and where the challenges presented during the ideation, production and distribution process were discussed.

The second day, held at Neumünster, was reserved for industry professionals and delved into more technical and forward-looking topics within the XR industry. This day fostered international exchanges and promoted peer networking among professionals. Discussions during day 2 covered the current challenges in the preservation of digital content, funding opportunities for XR projects showcasing the German Regional funding system, and last but most important, the impact of AI technology on immersive experiences was discussed in a panel titled “AI/XR: The Future of AI for Immersive and Virtual Arts”.

Among the guests of this year’s programme, we found Stéphane Hueber-Blies and Nicolas Blies, directors of “Ceci est mon cœur“, François Vautier director of “Champ de Bataille” awarded “Best Immersive Experience” at this year’s Immersive Pavilion, and Octavian Mot, director of “AI & Me: The Confessional and AI Ego“, a thought-provoking installation that explores the intricate relationship between humans and artificial intelligence.

Mot’s work invites participants into an intimate dialogue with an AI entity, challenging them to reflect on themes of identity, consciousness, and the evolving dynamics between human and machine. The installation was a highlight of the Immersive Pavilion 2025, showcasing the potential of AI to create deeply personal and immersive art experiences.

Image 1: "AI & Me" installation. User Analysis in process.
Image 1: "AI & Me" installation. User Analysis in process.

While “AI & Me” offers an introspective exploration of human-AI interaction of an artistic nature, it was impossible not to draw parallels to VOX Reality’s AI Agent, representing a more utilitarian and non-intrusive application of AI in immersive environments. VOX Reality’s AI Agent is designed to enhance user experiences in virtual spaces by providing responsive and adaptive interactions, serving roles ranging from virtual assistants to dynamic characters within virtual narratives, while “AI & Me” creates its narrative by “perceiving” users while completely unbound to the industry standard “rules of engagement” with humans, resulting in answers and interactions that can come across as cunning and raw, while leaving the user to deal with his sense of humour. Brilliant!

Image 2: "AI & Me" installation. User AI representation.

The juxtaposition of “AI & Me” and VOX Reality’s AI Agent underscores the diverse applications of AI agents in today’s technological landscape. On one hand, AI is leveraged as a medium for artistic expression, prompting users to engage in self-reflection and philosophical inquiry. On the other hand, AI serves practical functions, improving user engagement and functionality within virtual environments. This duality highlights the versatility of AI agents and their growing significance across various domains.

In conclusion, Immersive Days 2025 in Luxembourg City successfully bridged the gap between art and technology, providing a platform for meaningful discussions and showcasing pioneering works in the field of immersive experiences. The event not only highlighted the current state of immersive art and technology but also set the stage for future innovations, emphasising the importance of interdisciplinary collaboration and the country’s significant contribution to the international immersive production landscape, a success largely attributed to the strategic initiatives of the Film Fund Luxembourg.

Picture of Manuel Toledo - Head of Production at VRDays Foundation

Manuel Toledo - Head of Production at VRDays Foundation

Manuel Toledo is a driven producer and designer with over a decade of experience in the arts and creative industries. Through various collaborative projects, he merges his creative interests with business research experience and entrepreneurial skills. His multidisciplinary approach and passion for intercultural interaction have allowed him to work effectively with diverse teams and clients across cultural, corporate, and academic sectors.

Starting in 2015, Manuel co-founded and produced the UK’s first architecture and film festival in London. Since early 2022, he has led the production team for Immersive Tech Week at VRDays Foundation in Rotterdam and serves as the primary producer for the XR Programme at De Doelen in Rotterdam. He is also a founding member of ArqFilmfest, Latin America’s first architecture and film festival, which debuted in Santiago de Chile in 2011. In 2020, Manuel earned a Master’s degree from Rotterdam Business School, with a thesis focused on innovative business models for media enterprises. He leads the VRDays Foundation’s team’s contributions to the VOXReality project.

Twitter
LinkedIn
Copy of OC EIC Solvers Webinar #2 (1920 x 1080 px) (1920 x 1200 px) (LinkedIn Post) (1)

Developing NLP models in the age of AI race

The AI rance intensifies

During the last 10-15 years, Natural Language Processing (NLP) has undergone a profound transformation, driven by advancements in deep learning, use of massive datasets and increased computational power. These innovations led to early breakthroughs such as word embeddings (Word2Vec [1], GloVe [2]) and paved the way for advanced architectures like sequence-to-sequence models and attention mechanisms, all based on neural architectures. It was in 2018, that the introduction of transformers and especially BERT [3]  (as an open-source model) that enabled the contextualized understanding of language. Performance in NLP tasks like machine translation, sentiment analysis or speech recognition has been significantly boosted, making AI-driven language technologies more accurate and scalable than ever before.

The “AI race” has intensified with the rise of large language models (LLMs) like OpenAI’s ChatGPT [4] and DeepSeek-R1 [5], which use huge architectures with billions of parameters and massive multilingual datasets to push the boundaries of NLP. These models dominate fields like conversational AI and can perform a wide range of tasks by achieving human-like fluency and context awareness. Companies and research institutions worldwide are competing to build more powerful, efficient, and aligned AI systems, leading to a rapid cycle of innovation. However, this race also raises challenges related to interpretability, ethical AI deployment and the accessibility of high-performing models beyond large tech firms.

But what did DeepSeek achieve? In early 2025, DeepSeek released its R1 model, which has been noted to outperform many state-of-the-art LLMs at a lower cost, therefore it caused a disruption in the AI sector. DeepSeek had made its R1 model available on platforms like Azure, allowing users to take advantage of their technology. DeepSeek introduced many technical innovations that allowed their model to thrive (such as architecture innovations: hybrid transformer design, the use of mixture-of-experts models, auxiliary-loss-free load balancing) however their main contribution was the reduction of reliance on traditional labeled datasets. This innovation stems from the integration of pure reinforcement learning techniques (RL), enabling the model to learn complex reasoning tasks without the need for extensive labeled data. This approach not only reduces the dependency on large labeled datasets but also streamlines the training process, lowering the resource requirements and costs associated with developing advanced AI models.

Figure: DeepSeek architecture (taken from https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1)
Figure: DeepSeek architecture (taken from https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1)

The Enduring Relevance of models used in VOXreality

At VOXReality we take a fundamentally different approach and believe in the significant value brought by “traditional” AI models (esp. for ASR and MT) particularly in specialized domain applications. We prioritize real open-source AI by ensuring transparency, reproducibility and accessibility [6]. Unlike proprietary or restricted “open weight” models, our work is built upon truly open architectures that allow full modification and deployment without any limitations. This is the reason that our open call winners [7] are allowed to build on top of the VOXreality ecosystem. Moreover, our approaches often require less computational power and data, making them suitable for scenarios with limited resources or where deploying large-scale AI models is impractical. Our models can be tailored to specific industries or fields, incorporating domain specific expertise without extensive or expensive retraining. The implementation of models on a local scale (if chosen), can also offer enhanced control over data and compliance with privacy regulations, which can be a significant consideration in sensitive domains. 

VoxReality’s Strategic Integration

At VoxReality, we strategically integrate traditional ASR and MT approaches to complement advanced AI models, ensuring a comprehensive and adaptable solution that leverages the strengths of state-of-the art AI models. This focus on real open-source innovation and data-driven performance differentiates VOXreality from the rapidly evolving landance of AI mega-models.

Picture of Jerry Spanakis

Jerry Spanakis

Assistant Professor in Data Mining & Machine Learning at Maastricht University

References

[1] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.

[2] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics

[4] https://openai.com/chatgpt/overview/

[5] https://github.com/deepseek-ai/DeepSeek-R1

[6] https://huggingface.co/voxreality

[7] https://voxreality.eu/open-call-winners/

Twitter
LinkedIn
Picture1

VOXReality’s Open Learning Revolution

Industrial training has traditionally followed rigid, stepwise instruction, ensuring compliance and accuracy but often at the cost of creativity and adaptability. However, with the rapid advancements in Extended Reality (XR) and Artificial Intelligence (AI), training methodologies are shifting toward more dynamic and flexible models.

At the heart of this transformation is VOXReality, an XR-powered training system that departs from traditional step-by-step assembly guides. Instead, it embraces a freemode open-learning approach, allowing workers to logically define and customize their own assembly sequences with complete freedom. This method enhances problem-solving skills, engagement, and real-world adaptability.

Unlike conventional training, which dictates a specific order of operations, VOXReality’s open-ended model empowers users to experiment, explore, and determine their optimal workflow. This approach offers several key benefits: workers are more engaged when they can approach tasks in a way that feels natural to them; trainees develop a deeper understanding of assembly processes through problem-solving rather than rote memorization; the system adapts to different skill levels, allowing experienced workers to optimize workflows while providing guidance to beginners; and, since real-world assembly is rarely linear, this method better prepares workers for unexpected challenges on the factory floor.

VOXReality integrates an AI-driven dialogue agent to ensure trainees never feel lost in this open-ended system. This virtual assistant provides real-time feedback, allowing users to receive instant insights into their choices and refine their approach. It also enhances engagement and interactive learning by enabling workers to ask questions and receive contextual guidance rather than following static instructions. Additionally, the AI helps prevent errors by highlighting potential missteps, ensuring that creativity does not come at the cost of safety or quality.

Development Progress:

Below, we outline the development status with some corresponding screenshots that showcase the system’s core functionalities and user interactions.

The interface features two text panels displaying the conversation between the user and the dialogue agent. When the user speaks, an automated speech recognition tool (created by our partners from Maastricht University) converts their speech into text and sends it to the dialogue agent (created by our partners in Synelixis), which is shown in the top panel (input panel). The dialogue agent then processes the input, provides contextual responses, and uses a text-to-speech tool to read them aloud. These responses are displayed in the lower panel (output panel). Additionally, the system can trigger audio and video cues based on user requests. The entire scene is color-coded to enhance user feedback and improve interaction clarity.

The screenshots below capture the dialogue between a naive user and the dialogue agent. The user enters the scene and asks for help. The Dialogue Agent is guiding the user for the next steps.

The screenshot below captures the user’s curious question regarding the model to be assembled. The Dialogue Agent provides contextual answers to the user.

The user asks the Dialogue Agent to show a video about one of the steps. The Dialogue Agent triggers the function in the application to show the corresponding video on the output panel.

The user grabs an object and asks the Dialogue Agent to give a hint about the step he wants perform. The Dialogue Agent triggers the function in the application to give a useful hint.

The implementation of freemode XR training is just the beginning. As AI and XR technologies continue to evolve, the potential for fully immersive, adaptive, and intelligent industrial training systems grows exponentially. The success of this approach will be measured by increased worker efficiency, reduced onboarding time, and higher retention of complex technical skills.

VOXReality’s commitment to redefining industrial learning aligns with the broader movement toward smart manufacturing and Industry 5.0. By blending technology with human intuition and adaptability, we are not just training workers—we are empowering the future of industry. We are looking forward to test the solution with unbiased users and receive feedback for improvements.

Picture of Leesa Joyce

Leesa Joyce

Head of Research @ Hololight

&

Picture of Gabriele Princiotta

Gabriele Princiotta

Unity XR Developer @ Hololight

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (Instagram Post) (1)

Celebrating Women in Extended Reality: Insights and Inspiration from the Women in XR Webinar 

VOXReality, in collaboration with SERMAS, XR4ED, TRANSMIXR, HECOF, MASTER, and CORTEX2 projects, had the privilege of hosting the “Women in XR Webinar – Celebrating Women in Extended Reality.” This online event brought together leading female experts from EU-funded XR projects for an inspiring discussion on the role of women in the rapidly evolving field of Extended Reality. We were honored to have a panel featuring Regina Van Tongeren, Grace Dinan, Leesa Joyce, Moonisa Ahsan, Megha Quamara, Georgia Papaioannou, Maria Madarieta, and Marievi Xezonaki. From seasoned trailblazers with 20 years of experience to emerging voices, these panelists shared their journeys, challenges, and invaluable insights. This webinar aimed to highlight the importance of gender diversity in XR and provide practical advice for aspiring women in tech.

Navigating the Digital Divide: Realities of Women in XR

The panelists openly discussed the challenges women face in the XR industry. While the field offers immense creative potential, it is inherently challenging. Participants highlighted several key issues. Women often find fewer opportunities compared to their male counterparts. The persistent pay gap remains a significant barrier. Women’s contributions can be overlooked, hindering career advancement. Some women still experience difficulties in accessing advocacy and support from the broader XR community. These challenges underscore the need for systemic changes to ensure equal opportunities and recognition for women in XR.

Unlocking Immersive Potential: The Boundless Opportunities for Women in XR

Despite the challenges, the webinar emphasized the vast opportunities available for women in XR. The panelists pointed to the expanding applications of XR across various sectors. XR has the potential to modernize medical training, patient care, and therapy in healthcare. Immersive learning experiences enhance engagement and knowledge retention in education. Innovative applications for virtual try-ons, digital fashion, and immersive design processes are emerging in fashion and design. The XR field is not limited to technical roles; it requires a wide range of skills, including legal expertise, artistic talent, and scientific knowledge. These emerging opportunities present a unique chance for women to lead and shape the future of XR.

Empowering the Future: Actionable Insights and Key Takeaways for Women in XR

The panelists shared a wealth of practical advice for women looking to thrive in the XR industry. They emphasized the importance of building a strong network and finding a supportive community within XR. Organizations like Women in Immersive Tech Europe [1] provide valuable resources, mentorship, and networking opportunities. Seeking out inspiring role models, such as Parul Wadhwa [2], or event figures like Marie Curie, and learning from their experiences was also strongly encouraged.

Furthermore, the panelists stressed the importance of being assertive and comfortable making suggestions. Staying updated with the latest developments in the rapidly evolving XR field is crucial, as is a commitment to continuous learning. They advised against trying to conform to a pre-existing mold, urging women to bring their unique perspectives to the table and contribute to creating inclusive XR experiences. Building a strong online brand, including a professional portfolio, active social media channels, and a personal website, was highlighted as essential for visibility.

For XR teams, the message was clear: diversity must be a core value, integrated into the DNA of the team and its products, not an afterthought. Diversifying hiring teams to include a wide range of skill sets is essential. For those considering starting their own businesses or working freelance, platforms like Immersive Insiders [3] and TalentLabXR [4] were recommended, along with exploring relevant courses from institutions such as the University of London and the University of Michigan.

The webinar left us with a powerful call to action, inspiring us to work together towards a more inclusive and equitable XR future. We encourage you to follow all the panelists, especially the member from our team, Leesa Joyce and Moonisa Ahsan, and be inspired by their ongoing leadership!

Missed the Live Session? Catch the Recording! If you were unable to join us live, don’t worry! The full event recording is available on the F6S Innovation YouTube channel.

Picture of Ana Rita Alves

Ana Rita Alves

Communication Manager at F6S, where she specializes in managing communication and dissemination strategies for EU-funded projects. She holds an Integrated Master’s Degree in Community and Organizational Psychology from the University of Minho, which has provided her with strong skills in communication, project management, and stakeholder engagement. Her professional background includes experience in proposal writing, event management, and digital content creation.

Twitter
LinkedIn
Photo by James Bellorini, https://www.citymatters.london/london-short-film-festival-smart-caption-glasses/

Choosing the right AR solution for Theatrical Performances

Choosing suitable AR devices for the VOXReality AR Theatre has been a challenging endeavour. The selection criteria are based on the user and system requirements of the VOXReality use case, which have been extracted through a rigorous user-centric design process and iterative system architecture design. The four critical selection criteria dictate that:

  1. The AR device should have a comfortable and discreet glass-like form factor for improved user experience and technological acceptance in theatres.
  2. The AR device should support affordability, durability and long-term maintenance for feasible implementation at audience-wide scale.
  3. The AR device should support personalization, so that each audience member can customize the fit to their needs, and allow strict sanitization protocols, so that device distribution can adhere to high level public health standards.
  4. The AR device should support application development with open standards instead of proprietary SDKs for widespread adoption of the solution.

Given the above criteria, the selection process presents a clear challenge because no readily available AR solution offers a perfect fit. To address this need, the VOXReality team performed an extensive investigation of the range of available options with a view to the past, present, and future, and is presenting the results below.

The past

A quick look to the past shows that popular AR options were clearly unsuitable given the selection criteria. Specifically, affordable AR has a long, proven track as affordable and user-friendly camera-based AR, deployed on consumer smartphones and distributed through appropriate platforms as standalone applications. The restrictions on the user experience though, such as holding a phone up while seated to watch the play through a small screen, make this a clearly prohibiting option. Previous innovative designs of highly sophisticated -and costly- augmented reality devices, sometimes also referred to as holographic devices or mixed reality devices, do not support either scalability, or discreet presence required by the use case, and are also similarly rejected. Finally, a range of industry-oriented AR designs available as early as 2011 focused on monocular, non-invasive AR displays with limited resolution and display capabilities, at a prohibiting cost due to (among others) extensive durability and safety certifications. Therefore, with a view to the past, one can posit that the AR theatre use case had not taken up popularity due to pragmatic constraints.

The present

In recent years and as early as 2020, hardware developments picked up pace. Apart from evolutions of previously available designs and/or newcomers in similar design concepts, more diverse design concepts have been introduced in a persistent trend of lowering procurement costs and offering more support for open-source frameworks. Nowadays, this trend has culminated in a wide range of “smart glasses”, i.e. wearables with glass-like form factor supporting some level of audiovisual augmentation, which rely on external devices for computations (such as smartphones).

This design concept finds its origins in the previously mentioned industry-oriented AR designs, as well as their business-oriented counterparts. This time though, the AR glass concept is entering the consumer space with options that are durable for daily, street wear and tear while also remaining affordable for personal use. Some designs are provided directly bundled with proprietary software in a closed system approach (like AI-driven features or social media-oriented integrations), but the majority offers user-friendly, plug-n-play, tethered or wireless capabilities, directly supporting most personal smartphones or even laptops.

The VOXReality design

This landscape enables new, alternative system designs for AR Theatre use cases: instead of theatre-owned hardware/software solutions, one can envision system with a combinations of consumer and theatre-owned hardware/software elements. By investigating how various stakeholders respond to this potential, we can pinpoint best practices and future recommendations. 

Examining implemented AR theatre use cases, one can validate that the past landscape is dominated by a design approach with theatre-owned integrated (hardware/software) solutions. Excellent examples where the theatre provides hardware and software to the audience, are the National Theatre’s smart caption glasses feature [1], developed by Accenture and the National Theatre, as well as the Greek-based SmartSubs [2] project.

One new alternative that presents itself is for audience to use their own custom hardware/software solutions dedicated to live subtitling and translations during performances. In this case, each user can choose their own AR device, pre-bundled with general-purpose translation software of their preference.

As eloquently described in a recent article though [3], general purpose AI-captioning and translations frequently make mistakes and fail to capture nuances, which especially in artistic performances can break immersion and negatively impact audience experience. Therefore, in VOXReality we design for a transition from the past to the future: developing custom software dedicated to theatrical needs, optimized at generating real-time subtitles and translations of literary text, which can also be easily deployed on theatre-owned AR devices and/or on consumer-owned devices with minimal adaptations. This is enabled by a rigorous user-centric design approach which can verify the features and requirements per deployment option, as well as contemporary technical development practices using open standards such as OpenXR.

The future

The future looks bright with community driven initiatives showing how accessible AR and AI technology can be, as in the example of open-source smart glasses you can build on your own [4], and in the continuous improvements on automatic speech recognition and neural machine translation allowing models to run performantly on ever less resources. VOXReality aims to leave a long-standing contribution to the domain of AR theatre with the objective of establishing reliable, immersive and performant technological solutions as the mainstream in making cultural heritage content accessible to all.

Picture of Spyros Polychronopoulos

Spyros Polychronopoulos

Research Manager at ADAPTIT and Assistant Professor at the Department of Music Technology and Acoustics of the HMU

&

Picture of Olga Chatzifoti

Olga Chatzifoti

Extended Reality applications developer working with Gruppo Maggioli for the design and development of the Augmented Reality use case of the VOXReality HORIZON research project. She is also a researcher in the Department of Informatics and Telecommunications of the University of Athens.

Twitter
LinkedIn
VOXReality template LinkedIn v3 (2).pdf (Instagram Post) (41)

Seeing into the Black-Box: Providing textual explanations when Machine Learning models fail.

Machine learning is a scientific practice which is heavily tied with the terms “error” and “approximation”. Sciences like Mathematics and Physics are associated with error, induced by a need to model how things work. Moreover, the abilities of humans in intelligence tasks are also tied with error, since some actions associated with these abilities may be the result of failure, while other actions may be deemed as truly successful ones. There have been myriads of times when our thinking, our categorization ability, or our human decisions, have failed. Machine learning models, which try to mimic and compete with human intelligence in certain tasks, are also tied with successful operations or erroneous ones.

But how can a machine learning model, a deterministic model with the ability to empirically compute the confidence it has for a particular action, diagnose itself that it makes an error when processing a particular input? Even for a machine learning engineer, trying to intuitively understand why without studying a particular method seems difficult.

In this article, we discuss a recent algorithm for this problem that convincingly explains how; in particular, we describe the Language Based Error Explainability (LBEE) method by Csurka et al. Here, we will recreate an explanation on how this method leverages the convenience of generating embeddings via the CLIP model contributed by OpenAI, which allows one to translate text extracts and images into high-dimensional vectors that reside in a common vector space. By projecting texts or images in this common high-dimensional space, we can compute the dot product between two embeddings (which is a well-known operation that measures the similarity among two vectors) to quantitatively measure how similar the two original text/image objects are as a function of other dot product operations (or, put simply, other similarities among vectors) involving pairs of other objects.

The designers of LBEE have developed a model that can report a textual error description of a model failure in cases where the underlying model asserts an empirically low confidence score in taking an action that the model was designed for. Part of the difficulty in grasping how such a method fundamentally works is our innate wondering about how the textual descriptions explaining the model failure are generated from scratch as a function of an input datum. In our brains, we often do not put much effort when we need to explain why a failure happens and we instantly arrive at clues to describe them, unless the cause drifts apart from our fundamental understanding of the inner workings of the object that is involved in the failure. To keep things interesting, we can provide an answer to this wondering: instead of assembling these descriptions once for each input, we can generate them following a recipe a-priori and then reuse them in the LBEE task by computationally reasoning about the relevance of a candidate set of explanations in relation to a given model input. In the remainder of this article, we will see how:

Suppose that we have a classification model that was trained to classify the object type of an only object depicted in a small color image. We could, for example, take photographs of objects in a white background with our phone camera and pass these images to the model in order for it to classify the object names. The classification model can yield a confidence score ω that is between 0 and 1, representing the normalized confidence that the model has when assigning the images to a particular class in relation to all the possible object types that are recognizable by the model. It is usually observed that when a model does poorly in generating a prediction, the resulting confidence score may be quite low. But what is a good empirical threshold T that allows us to identify a poor prediction or a confident prediction? To empirically estimate two such thresholds, one for identifying easy predictions and one for identifying hard predictions, we can take a large dataset of images (e.g., the ImageNet datase) and pass each image to the classifier. For the images which were classified correctly, we can plot the confidence scores generated by the model as a normalized histogram. By doing so, we may expect to see two large lobes in the histogram, representing the confidence values which correspond to confident inferences and less confident inferences. We may also expect to see some degradation in the frequency masses around the two lobes, which is possible. Otherwise, we would come up with a histogram presenting two lobes which are highly leptokurtic. One lobe concentrates empirical prediction scores that are lower, and a second lobe will concentrate many scores which are relatively high. Then, we can set an empirical threshold that separates the two lobes.

Csurka and collaborators designate separating images as easy or hard based on the confidence score of a classification machine learning model and their relation to the cut-off threshold (see Figure 1). By distinguishing these two image sets, the authors compute the embeddings of the images from each group, and for each image in these sets they compute an ordered sequence of numbers (for our convenience, we will use the term vector to refer to this sequence of numbers) which describe the semantic information of the image. To do this, they employ the CLIP model contributed by OpenAI, the company that is famous for the delivery of the ChatGTP chatbot model, which excels at producing embeddings for images and text in a joint high-dimensional vector space. The computed embeddings can be used to measure the similarity between an image and a very small text extract, or the similarity between a pair of text extracts or images.

As a later step, the authors wanted to identify the groups of image embeddings that share similarities. To do this, they use a clustering algorithm which can take in vectors generated by a “generating machine” and identify the clusters of the embeddings. The number of clusters that fit a particular dataset is non-trivial to define. All in all, we come up with two types of clusters: clusters of CLIP embeddings for “easy” images, and clusters of CLIP embeddings for “hard” images. Then, any hard cluster center is picked and for it the closest easy cluster center is found. This allows us coming up with two embedding vectors originating from the clustering algorithm. The two clusters, “easy” and “hard”, are visually denoted at the top-right sector of Figure 1, by green and red -dotted enclosures.

The LBEE algorithm then generates a set S of sentences that describe the above-said images. Therefore, for each short sentence that is generated, the CLIP embedding is computed. As it was mentioned earlier, this text embedding can be directly compared to the embedding of any image by calculating the dot product (or inner product) of the two embedding vectors. Consider that the dot product measures a quantity that in the signal processing community is called linear correlation. The authors apply this operation directly. They compute the similarity of each text error description by computing the so-called cosine similarity measure between a text extract embedding and an image embedding, ultimately computing two relevance score vectors of dimensionality k < N. Each dimension is tied with a given textual description. By taking these two score vectors into consideration, the authors pass the two vectors in a sentence selection algorithm (we cover them in the next paragraph). The selected sentences are considered for this forward process by taking into account each hard cluster. The union of these sentence-sets is then output to the user, in return for an image that was supplied as input.

The authors chose to define four sentence selection algorithms, named SetDiff, PDiff, FPDiff and TopS. SetDiff computes the sentence-sets corresponding to a hard cluster and to an easy cluster. Then, the algorithm removes from the hard cluster sentence-set the sentences that also appear in the easy cluster sentence-set, and reports the resulting set to the user. PDiff takes two similarity score vectors i and j of dimensionality $k$ (where k denotes the top-$k$ relevant text descriptions), one from the hard set and one from the easy set. Then, Diff computes the difference between these two vectors, and from there it retains the sentences corresponding to the top $k$ values. TopS trivially reports as an answer all the sentences that correspond to the vector of top-k similarities. Figure 3 presents an example of textual failure modes generated by a computer vision model, each using one of the TopS, SetDiff, Diff and FPDiff methods. To enable evaluation of the LBEE model and methodology, the authors had to also introduce an auxiliary set of metrics, adapted to the specificalities of the technique. To further understanding on this innovative and very useful work, we recommend you to keep on reading [1].

References

[1] What could go wrong? Discovering and describing failure modes in computer vision, published in the proceedings of ECCV 2024.

Picture of Sotiris Karavarsamis

Sotiris Karavarsamis

Research Assistant at Visual Computing Lab (VCL)@CERTH/ITI

Twitter
LinkedIn
Copy of OC EIC Solvers Webinar #2 (74)

The rise of immersive technologies in theatre

Transforming the Audience Experience with VR and AR

Many aspects of our society have been profoundly impacted by the development of technology, especially the entertainment industry, which includes theatre. Virtual Reality (VR) and Augmented Reality (AR) are technologies that have massively changed how people think about entertainment and performance. These technologies are extremely versatile and can be used both to enhance the spectator’s experience without altering the essence of theatrical representation and to completely transform performances compared to classical theatre. The concept of augmented reality was first introduced in 1992 by Caudell and Mizell[1], and followingly expounded upon by Ronald Azuma, who outlined its potential applications in the entertainment sector, among others[2]. However, the real breakthrough came between 2010 and 2015, with the advent of VR headsets. In 2015, Microsoft’s HoloLens introduced the ability to overlay virtual objects onto the real world, fostering new experimentation. In the same year, the platform The Void was launched, becoming popular thanks to hyper-reality experiences that combined virtual reality with interactive physical environments. Due to its popularity, the platform was able to collaborate with major companies like Disney and work on internationally renowned projects such as Star Wars: Secrets of the Empire. The COVID-19 pandemic provided a strong push for the adoption of immersive technology, forcing theatres worldwide for necessity to experiment with digital formats and virtual experiences[3] (LINK).

The VR and AR in the entertainment market

The immersive technology market is expanding, driven by sectors such as entertainment, education, healthcare, and business, which are increasingly adopting VR and AR technologies. In 2023, the value of the immersive technology market was $29.13 billion, and future projections indicate it will reach $134 billion by 2030, with an annual growth rate of over 25%[4] (LINK). With over fifty percent of the market in 2023, the video game industry continues to be the leading industry for VR and AR in entertainment[5] (LINK). However, in live events and theatre these technologies have increasingly been used. Artificial Intelligence (AI) is being used into VR and AR experiences to enhance interactions and make them more accessible and natural[6] (LINK). Furthermore, as smart glasses and headsets have become more powerful and lighter, their latest developments have made their adoption easier for a wider range of users. Thanks to government-sponsored research initiatives like Horizon Europe, the growing investments in digital innovation, and the growing use of XR technologies in industries like entertainment, healthcare, and education, the immersive technology market in the EU is predicted to reach $108 billion by 2030[7] (LINK).

Enhancing theatre accessibility and audience engagement

Immersive technologies present plenty of possibilities to improve theatrical productions, enabling creativity in both the performance and its inclusivity. By employing virtual reality headsets, real-time subtitles and scene-specific context it is possible to improve the audience immersion and promoting inclusivity among people with hearing or language disabilities. This will increase the amount of people who attend plays, particularly in tourist cities where accessibility is severely limited by language barriers. Moreover, the use of these technologies increases the potential audience for theatrical plays because it also overcome geographic restrictions by enabling viewers to enjoy live performances from a distance in completely immersive virtual theatres.  It will allow people who are unable to travel because of age-related problems or disabilities to attend to the performances not only in two dimensions, as it is now in the case of watching a theatre show on television, but to attend performances in a very immersive experience. Finally, visual effects can be added to performances using VR and AR technologies, bridging the gap between traditional performing arts and modern production techniques.

Applications of VR/AR in Theatrical Performances

The incorporation of VR and AR into theatre has completely transformed how audiences interact with displays, these technologies have introduced new means to boost storytelling, accessibility, and interaction. The potential of these technologies in live performances has been shown by a variety of projects:

  • National Theatre’s Immersive Storytelling Studio: To increase audience engagement, the National Theatre in the UK has adopted immersive technologies. Their Immersive Storytelling Studio investigates the potential of VR and AR to produce more immersive and engaging experiences (LINK)[8].
  • White Dwarf (Lefkos Nanos) by Polyplanity Productions: this experimental project creates a novel theatrical experience by fusing augmented reality with live performance through the interaction of digital materials with performers on stage (LINK)[9].
  • Smart Subs by Demokritos Institute: This project makes theatre performances more accessible to international and hearing-impaired audiences by using AR-powered smart captions that provide live subtitles (LINK)[10].
  • XRAI Glass: The use of AI technology in this case in combination with AR smart glasses, can provide real-time transcriptions and translations, enabling people with hearing impairments to follow along or comprehend plays in multiple languages (LINK)[11].
  • National Theatre Smart caption glasses (UK): The National Theatre in collaboration with Accenture and a team of speech and language experts led by Professor Andrew Lambourne has developed a “Smart caption glasses” solution as part of their accessibility program for their performances. The smart caption glasses have been in effect since 2018 (LINK), and have also been demonstrated in the 2020 London Short Film Festival for cinematic screenings (LINK).

These applications show how VR and AR are improving visual effects while also increasing accessibility and inclusivity in theatre. Theatre companies can reach a wider audience, overcome language hurdles, and produce captivating, interactive shows that push the limits of conventional theatre by incorporating immersive technologies.

Conclusion

As technology advances, VR and AR technology will become increasingly used in theatrical performances, both to create a more immersive experience and to make theatre more accessible, attracting new audiences and expanding the reach of the performing arts. In an increasingly digital environment, these technologies will guarantee that live performances continue to be both revolutionary and relevant in the cultural context. Additionally, the creation of AI-powered VR and AR tools will enable to modify and customize shows according to audience preferences, resulting in more profound emotional experiences and unprecedented accessibility to theatre.

References

Azuma, Ronald T. “A survey of augmented reality.” Presence: teleoperators & virtual environments 6.4 (1997): 355-385Iudova-Romanova, Kateryna, et al. “Virtual reality in contemporary theatre.” ACM Journal on Computing and Cultural Heritage 15.4 (2023): 1-11.

Jernigan, Daniel, et al. “Digitally augmented reality characters in live theatre performances.” International Journal of Performance Arts and Digital Media 5.1 (2009): 35-49.

Pike, Shane. “” Make it so”: Communal augmented reality and the future of theatre and performance.” Fusion Journal 15 (2019): 108-118.

Pike, Shane. “Virtually relevant: AR/VR and the theatre.” Fusion Journal 17 (2020): 120-128.

Srinivasan, Saikrishna. Envisioning VR theatre: Virtual reality as an assistive technology in theatre performance. Diss. The University of Waikato, 2024.

[1] Caudell, Thomas & Mizell, David. (1992). Augmented reality: An application of heads-up display technology to manual manufacturing processes. Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences. 2. 659 – 669 vol.2. 10.1109/HICSS.1992.183317.

[2] Azuma, Ronald T. “A survey of augmented reality.” Presence: teleoperators & virtual environments 6.4 (1997): 355-385

[3] Signiant. VR & AR: How COVID-19 Accelerated Adoption, According to Experts. 2024

[4] Verified Market Reports. Immersive Technologies Market Report. 2024

[5] Verified Market Reports. Immersive Technologies Market Report. 2024.

[6] Reuters. VR, AR headsets demand set to surge as AI lowers costs, IDC says. 2024.

[7] Mordor Intelligence. Europe Immersive Entertainment Market Report. 2024.

[8] National Theatre. Immersive Storytelling Studio. 2024.

[9] Polyplanity Productions. White Dwarf (Lefkos Nanos). 2024.

[10] Demokritos Institute. Smart Subs Project. 2024.

[11] XRAI Glass. Smart Glasses for Real-Time Subtitles. 2024.

Picture of Greta Ioli

Greta Ioli

Greta Ioli is an EU Project Manager in the R&D department of Maggioli Group, one of Italy's leading companies providing software and digital services for Public Administrations. After earning a degree in International Relations – European Affairs from the University of Bologna, she specialized in European projects. Greta is mainly involved in drafting project proposals and managing dissemination, communication, and exploitation activities.

Twitter
LinkedIn
VOXReality template LinkedIn v3 (2).pdf (Instagram Post) (40)

Partner Interview #8 with F6S

The VOXReality project is driving innovation in Extended Reality (XR) by bridging this technology with real-world applications. At the heart of this initiative is F6S, a key partner ensuring the seamless execution of open calls and supporting third-party projects (TPs) from selection to implementation. In this interview, we sit down with Mateusz Kowacki from F6S to discuss their role in the consortium, the impact of mentorship, and how the project is shaping the future of AI and XR technologies.

Can you provide an overview of your organization's involvement in the VOXReality project and your specific role within the consortium?

F6S played a crucial operational role in the VOXReality project by managing the preparation and execution of the open calls. This thorought approach involved: designing the application process: determining eligibility criteria, application requirements and eveluation metrics, developing and disseminating the call, managing selection and implementation of the TP’s projects.

Essentially, F6S acted as the facilitator ensuring a smooth and efficient process of preparing and implementing open calls.

How do you ensure that both mentors and the projects they guide benefit from the mentorship process, and what does that look like in practice?

There are a lot of important factors that made the process of mentoring within VOXReality project a success but one of the key elements might be communication. That involves clearly outline of the roles and responsibilities of both the mentor and the project team. This includes setting expectations for communication frequency, meeting schedules, and deliverables. What is more regular check in with both mentors and projects to assess progress, identify any challenges, and provide support. Gather feedback on the mentorship experience to continuously improve the program. Those are for sure the core and basic elements of successful implementation. What we also developed in a sprint 2, based on lessons learnt from sprint 1, is a clear calendar of upcoming activities that involve TP’s and mentors. That help us with better execution and better understanding of our tasks. 

Regular meetings, checkups, openness to discuss have also played a crucial role. F6S helped all partners to better execute and navigate through the implementation of open call. 

How does the VOXReality team ensure that the XR applications being developed are both innovative and practical for real-world use?

The VOXReality team employs a multi-faceted approach to ensure that the XR applications being developed are both innovative and practical for real-world use. By funding projects through open calls, VOXReality fosters innovation and encourages a diverse range of ideas and approaches. This collaborative approach ensures that the development of XR applications benefits from the expertise of a wider community, leading to more creative and practical solutions. So basically, the whole selection process has been designed to cover as innovative technologies as possible. We have been lucky to attract a lot of application, so our selection of 5 TP’s has not been an easy task as a lot of projects represented good value of innovations and real-world use. Nevertheless, we believe that those five selected entities represent the best potential for future development, and we are sure that their pursuit for innovation will end up with their success.

The language translation system for VOXReality prioritizes cultural sensitivity and artistic integrity by relying on these literary translations, which capture the cultural nuances and emotional subtleties of the original text. To ensure that these aspects are preserved throughout the development, we conduct thorough evaluations of the translation outputs through internal checks. This evaluation is crucial for verifying that the translations maintain the intended cultural and artistic elements, thereby respecting the integrity of the original performance.

How do you think the VOXReality Open Call and the coaching process will shape the success and growth of innovative projects in the XR and AI fields?

I believe that the idea of cascade funding is crucial for discovering potential in small teams of creative professionals and for sure the projects like VOXReality help to leverage their activities to the higher level and bigger audience. The role of a coach is to ensure successful implementation of TP’s project within VOXReality but also to see the bigger picture of possibilities within the sector of public funded projects.  

What excites you most about the Third-Party Projects joining VOXReality, and how do you believe AI and XR technologies will reshape the industries they are targeting?

The cooperation with them. That’s for sure very interesting to see, how they work, how they interact. The dynamism, agility but at the same time keeping the deadlines and meeting expectations. It is something for sure that can inspire. Not only them but also bigger entities to think sometimes outside the box, to leave the comfort zone. For some of those entities the project with VOXReality project is a game changer in their entrepreneurial history, and we are very happy to be the part of it. XR technologies have very big potential of changing and creating our everyday life but we need always to see the real, social value into what we are doing within XR technologies. That’s one of the mottos we have in VOXReality. To bring real value to the society 

Picture of Mateusz Kowacki

Mateusz Kowacki

EU Project Manager @ F6S

Twitter
LinkedIn