VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (3)

Master’s Thesis titled “Exploring User Interaction Modalities for Open-Ended Learning in XR Training Scenarios” in Collaboration with VOXReality Defended this July in Munich

Master’s Student: Gabriele Princiotta
Thesis Advisors (TUM): Dr. Sandro Weber, Prof. Dr. David Plecher
Thesis Advisor (VOXReality): Leesa Joyce

A Master’s thesis titled “Exploring User Interaction Modalities for Open-Ended Learning in XR Training Scenarios” by Gabriele Princiotta was recently defended at the Technische Universität München (TUM). The thesis was co-advised by Dr. Sandro Weber and Prof. Dr. David Plecher from TUM, and Leesa Joyce from the VOXReality consortium.

This study explores how different interaction modalities affect user experience in open-ended training environments using XR. Specifically, the research focused on an AR assembly training application developed for the Microsoft HoloLens 2. Two interaction methods were designed and compared: a traditional hand-based Graphical User Interface (GUI Mode) and an AI-powered voice interaction mode (Voxy Mode), supported by a LLM and Automatic Speech Recognition.

The user study employed a within-subjects design to evaluate the impact of these modalities on user experience, cognitive load, usability, and task engagement. While quantitative findings showed significantly faster task completion times in GUI Mode—primarily due to shorter onboarding and user familiarity—no statistically significant differences emerged across other user experience metrics. This outcome was influenced by a strong learning effect throughout the study sessions.

However, qualitative feedback indicated a clear user preference for the Voxy Mode. Participants highlighted the engaging, supportive nature of interacting with the conversational AI assistant (named ARTA), noting how it made the training feel more natural and less mechanical. At the same time, the limitations of current ASR accuracy and the assistant’s understanding of nuanced or ambiguous user input were seen as key areas for future development.

ThesisPresentation

The VOXReality partners played an essential role in enabling this research. They provided the AI voice assistant model, customized it for integration into the HoloLens application, and supported the technical setup needed for the experiment. A general assembly test provided by the consortium was used as the basis for the training scenario in the user study.

The results highlight the potential of multimodal, voice-driven interfaces in XR training environments to improve engagement and perceived support—particularly in open-ended learning tasks. At the same time, the thesis underscores the practical limitations tied to current speech recognition capabilities and the need for more sophisticated user intent recognition and contextual awareness from AI assistants in XR.

Finally, the study also draws attention to the methodological challenges posed by learning effects in within-subject comparative studies of interface designs. As XR training applications become increasingly personalized and adaptive, future research should focus on enhancing the intelligence and robustness of voice interfaces and minimizing study bias to ensure reliable UX evaluation.

Picture of Gabriele Princiotta

Gabriele Princiotta

Unity XR Developer @ Hololight

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (1)

AIXTRA: VR Training Takes a Step Forward

AIXTRA, one of the winners of the VOXReality Open Call, has successfully completed a major testing phase. From July 21st to 25th in Leipzig, 34 participants took part in both single-user and multi-user VR training scenarios. The goal was to assess the user experience and technical performance of AI-powered VR training environments. Special attention was given to the AI trainer and voice processing features, with the team gathering end-user perceptions, subjective feedback, and objective data to ensure statistically meaningful insights.

A Diverse Group of Participants

The study’s participants, ranging from 18 to 44 years old, came from various backgrounds with differing levels of familiarity with AI and XR technologies. While many had professional experience with technology, most were “not much experienced in XR.” Concerns about AI were common, with participants frequently citing “data privacy and security, inaccuracy of AI-generated answers, dependence on AI reducing human skills, ethical considerations, bias in AI decisions, and environmental impact.” Reflecting this apprehension, a significant portion of participants (41.67%) had only “neutral trust” in AI systems.

Participants’ native languages included German, English, Ukrainian-Russian, Dutch, and Vietnamese. While English proficiency ranged from beginner to fluent, many users acknowledged their accent influenced how well they were understood. “Some noted a mixed or regional accent… responses ranged from ‘never’ to ‘sometimes’ when asked if they had difficulties being understood.”

Feedback from the testing sessions offered invaluable insights into the AIXTRA system’s real-world performance. In the single-user mode, participants found the AI assistant helpful, with one stating, “it felt intuitive and safe to work with the AI assistant” and that it “helped me manage tasks better.” However, technical challenges also emerged. A common issue was the need for users to “repeat or rephrase their inputs” to be understood. One participant experienced this firsthand, recounting, “I had difficulties pronouncing the English word ‘Barometer’ correctly, and the AI only recognised it after the 4th or 5th attempt.”

In the multi-user environment, users noted translation delays and occasional inaccuracies. One participant pointed out an issue with audio quality, stating, “Poor sound quality of the assistant was difficult”, highlighting how hardware can affect clarity and immersion. Another noted a lack of comprehensive audio cues, commenting, “There wasn’t audio feedback at every intermediate step”, which could impact the guided learning process.

Standout Feedback and Future Plans

The sessions yielded several memorable quotes that captured the dual nature of AI’s impact. One user in the single-user scenario noted the “Advantage: no fear of asking the soulless AI for advice and help without blaming myself”, but immediately followed with a perceived “Disadvantage: less independent thinking and reflection”. revealing a broader concern about “Over-Reliance and Complacency”. In the multi-user setting, a participant, despite pointing out a critical error, enthusiastically concluded, “Otherwise it was pretty cool”.

Following this successful testing phase, the AIXTRA project is now moving into its final stages. The team will use a structured user study, approved by an ethics committee, to combine user and developer feedback for deeper insights. The project will also make its demo application available as Open Access and has two more publications planned to increase public visibility. A scientific paper is also in the works to “evaluate the results and show future trends in the field of AI and XR environment”.

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (1)

Weld-E @ EEITE 2025 and AIOTI

WELD-E, one of the winners of the VOXReality Open Call, is a project that’s pioneering the future of human-robot collaboration in the welding industry. By integrating artificial intelligence (AI) with extended reality (XR), WELD-E has created a safer, more efficient, and more intuitive welding environment. The project’s team recently presented two key papers that detail the system’s advancements.

Published Papers

The WELD-E team has published two papers outlining their work. Each paper details a different aspect of the project’s technology and its implications for the future of manufacturing.

Paper 1: “WELD-E: Enhanced-XR Human-Robot Interactive Collaboration for Welding Operations “

  • Lead Authors: Andreas El Saer, George Tsakiris, Leonidas Valavanis, Aristea M. Zafeiropoulou, Konstantinos Loupos, George Argyropoulos, Petros Tsampiras
  • Publication: EEITE 2025, 6th International Conference In Electronic Engineering & Information Technology
  • Abstract: This paper introduces WEld-e, an end-to-end solution for human-robot collaboration in welding. It leverages AI and XR technologies, including Microsoft HoloLens, to address challenges like a lack of effective guidance and real-time monitoring. The system uses a multimodal interface with voice commands, spatial awareness, and automated decision-making. At its core, it employs deep learning models—such as Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), a domain-specific Welding Large Language Model (WeLLM), and a Visual Language Model—to execute welding commands with high precision and safety. The framework supports multilingual, context-aware interactions and improves operational efficiency and safety in hazardous industrial settings. The paper validates that WEld-e can significantly reduce setup times, training requirements, and error rates, aligning with Industry 5.0 objectives.
  • Key Breakthrough: By combining AI and Extended Reality, welders can interact naturally with robots using voice and gestures. This makes welding operations faster and more intuitive by reducing errors and improving safety, all without the need for complex programming.
1749215427278
Posters for the paper presented in EEITE 2025

Paper 2: “Edge AI IoT Immersive Applications”

  • Lead Authors: Andreas El Saer, George Tsakiris, Leonidas Valavanis, Aristea M. Zafeiropoulou, Konstantinos Loupos, George Argyropoulos, Petros Tsampiras
  • Publication: AIOTI (The European Alliance for Innovation)
  • Abstract: This paper presents the WEld-e system as an immersive, AI-driven platform for human-robot collaboration in robotic welding. It integrates Mixed Reality (MR) via Microsoft HoloLens, along with advanced voice and gesture control, and a suite of AI models (ASR, NMT, TTS, and WeLLM). The system enables real-time, multilingual communication between human experts and robotic welders. It also features a digital twin interface for spatially contextual feedback and safety monitoring of a UR10e robotic arm. The paper highlights a key innovation in its knowledge distillation pipeline for computer vision, which refines object detection. Built on a modular architecture using ROS2 and Unity3D, the system supports low-latency interactions, improves precision, and enhances operator awareness, aligning with Industry 5.0 goals.
  • Key Breakthrough: How an expert can remotely guide a welding robot using only spoken commands and hand gestures, all while seeing a real-time view through special glasses. This breakthrough simplifies complex tasks, making them faster and safer, and allows experts to supervise operations from a distance.

A Real-World Example

Imagine a welder in a factory wearing HoloLens glasses. Instead of manually programming a robot or using a complex control panel, they simply say, in their native language, “Start welding using template 3.” The system, powered by the AI described in the papers, understands the voice command, translates it if necessary, and directs the robotic arm to begin welding immediately.

But the system’s intelligence doesn’t stop there. Safety is a top priority. If another worker accidentally walks into the designated welding area, the system’s sensors instantly detect the person. It then stops the robotic arm, and a visual and auditory alert pops up in the welder’s glasses, saying something like, “Unauthorized person detected. Operation canceled.” This real-time, intuitive interaction not only streamlines the workflow but also creates a significantly safer working environment. 

The WELD-E project is a powerful example of how cutting-edge technology can transform traditional industries.

Twitter
LinkedIn
VAARHeT Usability Testing at Araisi - Welcome Avatar 3

VAARHeT Project Transforms Visitor Experience at Āraiši Ezerpils Archaeological Park

The VAARHeT project, one of the winners of the VOXReality Open Call and a pioneering initiative in enhancing cultural heritage experiences through technology, recently conducted successful usability testing of three innovative solutions at the Āraiši Ezerpils Archaeological Park in Cēsis, Latvia. From July 14th to 16th, 2025, visitors had the opportunity to engage with XR applications, providing feedback that will shape the future of museum interaction.

Diverse Technologies for a Richer Experience
“The study consisted of a usability test and user experience study of three AI-assisted voice-driven interactive XR applications for open-air museum visitors”, explains Cordula Hansen, a representative from XR Ireland, one of the project partners. These included:

  • VR Glasses for Storytelling: These immersive glasses transport visitors back in time, narrating and visually reconstructing the captivating history of the lake castle.
  • AR Glasses for Visual Translation: A different type of wearable technology, these glasses offer real-time visual translation for diverse language groups within a single tour, fostering inclusivity.
  • Mobile App Avatar for Practical Information: A friendly blue-haired avatar within a mobile application provides immediate answers to common visitor queries, such as opening hours and ticket prices.

Visitors at the Heart of the Innovation
The VAARHeT museum partner recruited 39 test participants, specifically targeting adults aged 25 to 55 with an interest in culture and museums, often visiting with children aged 3 to 12. This demographic aligns with the project’s design persona, ensuring feedback from the intended user base.

The primary objectives of involving these groups were multifaceted: to assess the overall usability of the applications, evaluate the added value of voice-driven XR experiences utilizing VOXReality’s NLP components and AI models, and identify areas for refinement with a view towards commercialization.

Overwhelming Positive Feedback for VR Storytelling
Initial data analysis, even while comprehensive processing is ongoing, reveals a clear favorite among the tested technologies: the virtual guide glasses.

David Toulon, an XR Ireland representative, shared, “Based on the feedback received so far, the virtual reality glasses are very popular. You can see and hear the story of how the castle was built. Visitor feedback suggests that the virtual reality glasses are here to stay.” [1]

This sentiment was echoed by test participants. Juris Beņķis, a visitor and former tourist guide, enthusiastically stated, “I’ve worked as a tourist guide, I know how hard it is to get children interested. I imagine that 20 children will come and put on these glasses, then it will be wow, they will be fascinated and will learn something.” [1]

The head of the archaeological park, Eva Koljera, highlighted the broader impact of the VR glasses, noting their utility beyond just engaging younger audiences: “People with mobility impairments who cannot physically get to the lake castle could get an idea of the castle here. Of course, young people and children also like it.” [1]

Cordula Hansen recounted how numerous visitors reacted with awe, with many exclaiming, “Wow, it really feels like you’re there!” Testers consistently praised the detailed graphical representations and the accurate explanations of how the historical houses were constructed. The novelty of voice interaction was also a popular feature, with one participant commenting, “I liked that there were no buttons to press.”

Continuous Improvement and Future Steps
The testing phase was also an opportunity for refinement. Before the main usability tests at Āraiši, preliminary performance tests were conducted in both lab settings and at the museum to ensure minimal latency and enable parallel testing of all three pilots.

One significant improvement was made to the Welcome Avatar, where the source material for the RAG component was re-curated to enhance the quality of responses. Cordula Hansen also noted an interesting user behavior: “due to the novelty of the voice interaction, first test users reported confusion when faced with the challenge of ‘speaking to the machine.’ We solved this through improved UX writing and adding some examples and tutorial prompts to the user flows to ‘practice’ with.”

Looking ahead, the project team has clear next steps. “After finalising the analysis of our test results, we will be able to determine which of the three VAARHeT pilots brings the most immediate value to the museum, and what type of interaction mechanics would be the most appropriate for that use case,” explained Cordula Hansen.

The plan is to further refine the most promising pilot experience and install it at the museum for an extended period, allowing for continuous feedback gathering in a real-world operational environment. Additionally, the voice-activation components trialed in these pilots will be further tested and integrated into an XR content management system platform specifically designed for the cultural heritage sector.

The VAARHeT project’s initial piloting has demonstrated the immense potential of XR technologies to create more engaging, accessible, and informative museum experiences. With the valuable feedback gathered, the future of cultural heritage interaction looks brighter and more immersive than ever!

[1] Part of the information in this article was sourced from the TV3.lv article “Virtuālais asistents, tulkošanas brilles un avatars: Āraišos notiek nākotnes muzeju izmēģinājumi” (Virtual assistant, translation glasses and avatar: future museums are being tested in Āraiši), available at https://tv3.lv/dzivesstils/celotprieks/virtualais-asistents-tulkosanas-brilles-un-avatars-araisos-notiek-nakotnes-muzeju-izmeginajumi/.

Photos courtesy of XR Ireland.

Twitter
LinkedIn
1750744402057

XR-CareerAssist @ SalentoXR 2025

XR-CareerAssist, one of the Open Call winners of VOXReality, was presented at the International Conference on Extended Reality, Salento XR 2025, held in Otranto, Italy (June 17–20, 2025). The project uses VR and AI to make career planning dynamic, personalized, and engaging.

The paper, “Transforming Career Development Through Immersive and Data-Driven Solutions,” focuses on XR-CareerAssist to address the limitations of traditional career counseling, often seen as boring, inflexible, and hard to access.

The system uses VR goggles and AI to create an immersive environment. Users interact with a 3D avatar that understands multiple languages, visualize career paths with interactive maps, and get personalized advice from a database of over 100,000 real career profiles. It also features voice commands for natural interaction.

The paper details the system’s design, built on a vast database of CVs, and describes how AI models work together. Future plans include testing with 25-40 real users.

Real-World Impact: Sarah’s Journey with XR-CareerAssist

Consider Sarah, a 35-year-old IT manager in the UK with 10 years of experience, aiming to become a Chief Officer. XR-CareerAssist helps her directly:

  • Immersive Start: Sarah puts on a MetaQuest 3 headset and enters a virtual environment with a 3D avatar.
  • Voice Interaction: She simply states her current role, experience, and Chief Officer aspiration.
  • Instant Analysis: The system quickly finds 1500 similar profiles, showing that such professionals typically work across 3 sectors and that reaching Chief Officer takes about 15 years.
  • Visual Career Map: Sarah sees an interactive Sankey diagram (flowchart) displaying various paths from Manager to Chief Officer over the next 10 years, including industry detours.
  • Multilingual Support: The system automatically translates everything into French if Sarah prefers, due to her work with a French company.
  • Personalized Insights: The AI-powered avatar explains visualizations, points out necessary skills, and highlights high-success career moves. Sarah can actively ask questions via speech.

This example shows how XR-CareerAssist offers data-backed, visual, and highly personalized guidance, empowering users to make informed career decisions.

2_Using Augmented Reality and Machine Learning for Captioning in Theatrical Experiences #183-poster for SalentoXR - V1_page-0001
Poster for the paper presented in SalentoXR 2025

Abstract: The rapid evolution of technology has created opportunities to transform traditional career guidance methods into dynamic, immersive, and data-driven solutions. XR-CareerAssist, is an innovative platform, that aims to provide career insights and enhance user engagement by integrating Extended Reality (XR) and Artificial Intelligence (AI) technologies. A dedicated tool is built and presented, which analyses over 100,000 anonymised professional profiles. This tool is a key-component of XR-CareerAssist and is used to visualise career trajectories, industry trends, and skill pathways through interactive and immersive experiences. Features such as virtual reality (VR) environments, voice-based navigation, multilingual support, and AI-driven 3D avatars empower users to explore career paths dynamically and intuitively. By merging robust data analytics with immersive visualizations, XR-CareerAssist not only boosts user engagement but also improves accessibility and aids in the clear interpretation of career trajectories. This study explores the envisioned scenarios, highlights results from initial testing with the CV Analysis tool, and examines how XR-CareerAssist enhances career guidance and training, fostering personalised and impactful career development in a globalised job market.  

Keywords: Career Guidance, Career Maps, Artificial Intelligence, LLMs, Virtual Reality

Full Article: “Transforming Career Development Through Immersive and Data-Driven Solutions” by N.D. Tantaroudas (Institute of Communication and Computer Systems, Greece), A. J. McCracken (DASKALOS-APPS, France), I. Karachalios (National Technical University of Athens, Greece), E. Papatheou (University of Exeter, UK), V. Pastrikakis (CVCOSMOS Ltd, UK)

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post)

Teaching AI Where Things Are: A Step Forward in Image Understanding

We all know that modern AI can recognize objects in images. Show it a photo, and it will tell you: “cat,” “car,” “person.”

But what if we asked the AI:
 “Is the cat in front of the sofa?”
 “Are the two chairs side by side?”

That’s a different challenge — it’s about understanding where things are in relation to each other, not just recognizing the things themselves.

Today, AI struggles with this task. Describing spatial relationships like “to the left of,” “on top of,” and “next to” is still rare in machine-generated captions. And yet this kind of understanding is essential in many real-world applications:

  • 🚗 Autonomous driving: knowing where a pedestrian is relative to a car.
  • 🤖 Robotics: navigating around obstacles in complex environments.
  • 🕶️ Assistive devices: describing scenes to visually impaired users.
  • 📱 Augmented reality: placing digital content in the correct spot in physical space.

Our team set out to help address this challenge by building tools to train and evaluate AI models on spatial understanding in images.

Why Is This Hard?

The problem starts with data.

Most of the image-captioning datasets used today focus on what is in the image, rather than where things are located.

 Typical captions might say:
 “A man riding a bicycle” — but not: “A man riding a bicycle to the left of a car.”

Without many examples of spatial language in the training data, AI models don’t learn to express these relationships well.

Even harder: there wasn’t a good way to measure whether a model was good at spatial descriptions. Existing evaluation tools (BLEU, ROUGE, etc.) measure grammar and vocabulary, but not spatial accuracy.

What We Did

To tackle this, we developed three key components:

1. New spatial training data
 We enriched an existing large-scale image dataset, COCO, with spatially grounded captions — sentences that explicitly describe where objects are in relation to each other.

2️. A new way to measure spatial understanding
 We created a simple but effective evaluation process:

  • Does the model generate sentences that correctly describe spatial relationships?
  • Does it do so consistently across different types of relationships and images?

Rather than using complicated language metrics, we finetuned several combinations of popular text encoders and visual decoders against the truth sentences that we extracted using computer vision and machine learning models. We compared the capability of these state-of-the-art models to learn spatial descriptions.

This gives a more direct measure of whether the model is truly understanding space, not just generating plausible-sounding text.

3️. Testing different models
We evaluated several popular combinations of vision and text models:

We found that some model combinations produce noticeably better spatial captions than others.

In particular, models that combine efficient visual transformers with robust language understanding perform best at capturing spatial relationships.

Why It Matters

This work is an important step toward AI systems that don’t just list objects, but can also reason about space:

  • Helping robots navigate better
  • Enabling safer autonomous vehicles
  • Supporting more helpful assistive technologies
  • Improving human-AI interaction in AR/VR systems

We’re making the enriched dataset and evaluation tools available to the community soon, so that others can build on this work and push spatial image captioning forward.

References
  1. Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).
  2. Dosovitskiy, Alexey, et al. “An image is worth 16×16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020).
  3. Lin, Tsung-Yi, et al. “Microsoft coco: Common objects in context.” Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, proceedings, part v 13. Springer International Publishing, 2014.
  4. Ranftl, René, Alexey Bochkovskiy, and Vladlen Koltun. “Vision transformers for dense prediction.” Proceedings of the IEEE/CVF international conference on computer vision. 2021.
  5. Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.

Georgios Papadopoulos

Research Associate at Information Technologies Institute | Centre for Research and Technology Hellas

Twitter
LinkedIn
Copy of Maastricht General Assembly (Instagram Post) (2)

Augmenting the Past: Applied Examples of AR and AI in Cultural Heritage from Greece

From the awe-inspiring ruins of ancient theatres to the timeless stories told through myth and drama, Greece’s cultural heritage is a rich tapestry of human creativity and memory. But how can cutting-edge technologies like Augmented Reality (AR) and Artificial Intelligence (AI) help preserve and reawaken this legacy in the present day?

At VOXReality, we explore how immersive technology can enrich live theatre with real-time augmented content, blending performance with spatial computing. The VOXReality AR theatre project sits at the crossroads of performative arts (the live performance), literary arts (theatrical texts and dramaturgy), and both tangible (theatres and spaces) and intangible (stories, rituals, performance practices) forms of cultural heritage. Our ambition is to use AR to enhance theatrical performances by overlaying digital elements such as visuals, text, and interactive content, directly onto the live stage. Yet this ambition brings a unique set of challenges. A theatre play is a dynamic, tightly choreographed experience. Augmented content must appear at the right moment and in the right place with perfect quality without disrupting the flow or distracting the audience.

To address these challenges, we look to other successful AR/AI applications within Greece’s cultural sector for insights and inspiration. What can we learn from their experiences, and how do our technological choices compare? In this post, we explore real-world examples of applied AR and AI in Greek cultural heritage and examine how they’ve tackled key issues such as content delivery, accessibility, and stability.

Example 1: NAXOS AR – From the Portara to the Temple of Apollo

The NAXOS AR application developed by MOPTIL in collaboration with the Municipality of Naxos and the Ephorate of Antiquities of Cyclades, offers an AR experience at the “Portara” historical location on Naxos Island. The experience provides a historically accurate 3D representation of the Temple of Apollo alongside additional textual background information, validated by academic and institutional partners.  The application can be downloaded from acclaimed application distribution platforms and executed on the visitor’s personal smartphone. To keep the installation files lightweight, the application downloads during runtime the required 3D content from a cloud service. If onsite mode, i.e. at the archaeological site, the application performs AR matching to recognize physical landmarks and display the digital reconstruction in the right positioning compared to the existing monument. If the user is in another location, the user can manually position, scale and rotate the 3D content.

The experience successfully addresses:

  • performance and robustness are achieved through a standalone application,
  • trustworthiness and accessibility are ensured via authorized distribution channels with security validation (like Google Play Store)
  • optimal content delivery is facilitated through downloading additional, large-sized content when required from cloud services with ensured streaming bandwidth
  • 3D content that is curated by expert archaeologists and
  • high quality UI/UX

Overall, the application showcases the feasibility of providing high-quality informational 3D content to the general audience, making cultural heritage more accessible and engaging through mobile AR.

[1] MOPTIL: NAXOS AR – Explore Cultural Heritage

Figure 1. NAXOS AR - AR reconstruction of the Temple of Apollo at the archaelogical site of Portara in Naxos
Example 2: COSMOTE CHRONOS – Blending AR and AI for Cultural Heritage

An even more advanced example from Greece is COSMOTE CHRONOS app, a cultural heritage experience that combines Augmented Reality (AR), and Artificial Intelligence (AI) while making use of 5G network capabilities. Originally designed as a 5G use case project, it quickly evolved into a popular app that brings to life the monuments of the Acropolis as they were in antiquity, acquiring more than 400k downloads worldwide. COSMOTE CHRONOS is a COSMOTE (telecommunications provider) project, designed in collaboration with Greece’s Ministry of Culture, the Acropolis Museum, and MOPTIL as technology partner. The app supports both on-site visits at the Acropolis site and off-site access, reimagining how visitors explore ancient monuments and making the cultural experience available to a broader audience.

The application supports similar capabilities with NAXOS AR, in terms of AR content. For example, the 3D models need to be optimised for real time rendering, using the same low poly approach as in the Naxos AR case. But this time, there is a new advantage: at the heart of the experience is Clio, an AI-powered digital assistant who interacts with users in real time, answering questions (written or oral) and guiding them through the virtual past. This dialogue-driven interface is a 5G only feature-that poses a technical challenge – it is made possible through a backend infrastructure that streams content dynamically, relying on high-speed 5G connectivity to enable seamless communication and responsive interaction.

Technically, the app represents a major undertaking with collaboration of partners representing communication infrastructure, immersive technology, artificial intelligence and of course, classical archaeology. Beyond the advanced user interface (UI) and photorealistic 3D reconstructions, special attention was given to the accuracy and adaptability of the AI assistant. A team of archaeologists, scriptwriters, and bot trainers continuously update and refine Clio’s knowledge base by analyzing user interactions and adjusting its responses, accordingly, ensuring an engaging and informative experience for all users. In addition, users also have the option to select a solo or group guided tour, allowing them to listen to historical information as they explore each monument, bringing further challenges in content sharing and synchronization. The group option allows up to 5 visitors to listen to the responses simultaneously in real time.

Culturally, CHRONOS exemplifies a new approach to heritage interpretation through storytelling that blends scientific rigor with emotional engagement of the user. With a custom musical score, thoughtful dialogue, and immersive visuals, the app transforms a visit to the Acropolis into an interactive journey through time, while also being accessible to remote users around the world.

While CHRONOS offers a rich and dynamic UX, it also highlights a key dependency: robust digital infrastructure. Real-time AI interactions require high-bandwidth connectivity, low latency, cloud-based model hosting, and continuous content moderation making it a powerful but infrastructure-intensive solution.

[2] CHRONOS

Figure 2. Clio, the virtual AI-based tour guide in CHRONOS app
Example 3: Ancient Kydonia AR Tour – Situated Storytelling through AR and Audio

The Ancient Kydonia AR Tour is a mobile application that offers an immersive augmented reality experience at six archaeological sites in Chania, Crete, combining 3D reconstructions, spatial audio, and gamified learning. The app, developed as part of the broader Ancient Kydonia History Tour project, focuses on enabling visitors to experience the ancient city of Kydonia, which lies beneath the modern urban fabric, through layered storytelling and site-specific digital content.

The application delivers 3D models of key structures from ancient Kydonia directly onto the physical locations where they once stood, using the visitor’s own smartphone camera and AR tracking to place these reconstructions accurately in situ.

Although this project features no AI services, it successfully addresses user engagement and immersion needs with solid design practices. Specifically, audiovisual integration and immersion is achieved with techniques like adaptive lighting conditions in real time and dynamic soundscapes with background music, ambient sounds, and historically inspired audio that enhances the emotional and spatial depth of the experience. The app further engages users with serious game mechanics that encourage exploration and learning through interaction -visitors are not just passive recipients of audiovisual information, but are invited to actively discover and reflect on the historical context of each site.

In terms of technical delivery, the Kydonia AR Tour is a lightweight application designed for recent Android devices (post-2022), and it requires minimal installation overhead. Additional 3D content is downloaded at runtime from cloud services, allowing for efficient resource usage while maintaining visual fidelity. The experience is designed for outdoor use with on-site GPS activation and requires camera and location access.

This project stands out for its integration of sensory modalities (visual, spatial, and auditory) while maintaining accessibility for the general public. It demonstrates how AR can enhance site-specific storytelling, particularly in urban archaeological environments where excavation is not always visible or accessible. By relying on context-aware delivery and immersive feedback, the app connects users to a “hidden” cultural layer of the city, expanding their understanding of place and history.

[3] Ancient Kydonia AR Tour

Figure 3. Ancient Kydonia AR Tour
Conclusion: Reimagining Cultural Heritage through Emerging Technologies

As the three examples from Naxos, Athens, and Chania demonstrate, Greece is at the forefront of using AR and AI to reimagine the way we experience and interpret cultural heritage. These projects not only make ancient worlds more accessible and engaging to modern audiences but also demonstrate how thoughtful design, grounded in historical accuracy and powered by cutting-edge infrastructure, can bring the past into dialogue with the present.

Each case highlights different technical and narrative strategies:

  • NAXOS AR showcases how downloadable AR content with precise local spatial alignment can offer robust and high-quality reconstructions at historical sites.
  • CHRONOS introduces real-time AI interaction and 5G-enabled dynamic content, creating a hybrid model of digital assistance and historical storytelling.
  • Ancient Kydonia AR Tour elevates spatial immersion through audio-visual layering and game-like engagement, transforming urban archaeology into an interactive discovery.

For the VOXReality project, these implementations offer valuable insights. From lightweight delivery strategies and backend architecture to user-centric accessibility and dynamic adaptation to context, we see how immersive technologies one-size-fits-all solutions but rather tailored responses to the unique challenges of each heritage site and experience.

In live theatre, these lessons are particularly critical. Unlike static heritage sites, live performances involve dynamic, real-time interactions where digital content must seamlessly align with the presence and actions of performers in a way that respects the integrity of both the narrative and the audience experience. By combining elements highlighted in the examples of heritage applications, such as edge computing, contextual triggers and multimodal design, the aim is to propose a model of how emerging technology can support not just the preservation of cultural memory, but its creative and collective reactivation.

Looking forward, it becomes clear that the convergence of AR, AI, and storytelling in cultural heritage represents more than a technological trend; it constitutes a paradigm shift. This shift invites both creators and audiences to experience and engage with culture not as a static inheritance, but as a living process continuously shaped and redefined.

Picture of Olga Chatzifoti

Olga Chatzifoti

Extended Reality applications developer working with Gruppo Maggioli for the design and development of the Augmented Reality use case of the VOXReality HORIZON research project. She is also a researcher in the Department of Informatics and Telecommunications of the University of Athens. Under the mentorship of Dr. Maria Roussou, she is studying the cognitive and affective dimensions of voice-based interactions in immersive environments, with a focus on interactive digital narratives. She has an extensive, multidisciplinary educational background, spanning from architecture to informatics, and has performed research work on serious game design and immersive environments in Europe, USA and the UK.

&

Picture of Alexandra Malouta

Alexandra Malouta

XR and User Experience researcher at Gruppo Maggioli, contributing to the MOTIVATE XR project that develops immersive training environments. Alexandra is also a researcher at the University of the Aegean, Department of Cultural Informatics and Communication, exploring the spatial and narrative design of collaborative XR environments for cultural heritage. Professional experience in the design, project management and communication of projects at the intersection of architecture, culture, and technology

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post)

Join Us for the VOXReality Test Pilots – Remote & In-Person Opportunities!

As part of the VOXReality project, we are excited to invite you to participate in a series of test pilots. These pilots will showcase the project’s progress and allow participants the opportunity to try out our technology and offer valuable feedback to our research team.

We’re organizing two events:

  • remote webinar on Thursday, June 19
  • An in-person pilot test in Rotterdam on Wednesday, June 26

Each test session will last approximately 60 minutes, and we will offer four test rounds per day to accommodate different schedules. Spaces are limited, and participants will receive a reward for their time. Don’t miss your chance to get involved!

Interested? Fantastic!

Please complete our registration form here:

👉 https://form.typeform.com/to/yKH4P2fw

Once we receive your registration, we’ll reach out with further details and be happy to answer any questions you may have. We’re looking forward to hearing from you and welcoming you during our test days!

Twitter
LinkedIn
image1 (2)

Pilot 0C : Outcomes and next steps

The VOXReality project, funded under the Horizon Europe framework, is an initiative aimed at revolutionizing Extended Reality (XR) by integrating advanced language and vision AI models. The Training Assistant use case is one among the three other innovative usecases and is aimed at creating immersive and interactive training environments. In Pilot 0C, an internal stress test conducted as part of this project, the Training Assistant usecase focused on evaluating user interaction modalities within an Augmented Reality (AR) training application designed for the Microsoft HoloLens 2. This blog post delves into the objectives, execution, and outcomes of the user study in the Training Assistant use case, highlighting its contributions to open-ended learning in XR training, and outlines the next steps to refine and advance the project.

VOXReality seeks to enhance XR experiences by combining innovative AI technologies, including Automatic Speech Recognition (ASR) and a Large Language Model (LLM)-based Augmented Reality Training Assistant (ARTA). The AR Training use case, integrates ASR and ARTA into the Hololight Space Assembly software, an experimental platform originally designed for linear industrial assembly training. For this pilot, the software was customized to support open-ended learning environments, allowing users greater flexibility in task execution, aligning with constructivist learning principles that emphasize problem-solving and engagement over rigid, prescribed sequences. Conducted in Santarcangelo, Italy, at Maggioli’s headquarters, Pilot 0C involved 13 participants, including consortium members and Maggioli employees. The primary goal was to compare two user interfaces within the customized Hololight Space Assembly platform: a voice-controlled “Voxy Mode,” leveraging ARTA and ASR for voice-driven interactions, and a traditional Graphical User Interface (GUI) mode relying on hand menus. The study aimed to assess how these modalities impact key user experience metrics, including cognitive load, usability, engagement, and overall user experience, in the context of industrial assembly training tasks.

Study Design and Execution

Pilot 0C employed a within-subjects study design, where each of the 13 participants experienced both Voxy Mode and GUI Mode in two sessions, with the order randomized to minimize bias. The training scenario involved industrial assembly tasks, where participants interacted with virtual objects in an AR environment using the Microsoft HoloLens 2. In Voxy Mode, users issued voice commands to ARTA, which provided context-aware guidance, while the GUI Mode utilized hand-menu interactions for task assistance.

The study collected data on several metrics:

  • Cognitive Load: Measured using the NASA-TLX framework, assessing mental and physical demand, pace, successful completion, hard work, and frustration.
  • Usability: Evaluated through the System Usability Scale (SUS) and the perceived helpfulness of the tutorial.
  • User Experience: Assessed via the User Experience Questionnaire (UEQ), focusing on supportiveness, efficiency, and clarity.
  • Engagement: Gauged using a questionnaire to evaluate immersion and involvement.

Quantitative data, such as task completion times (recorded by the system) and SUS scores were complemented by qualitative feedback, where participants provided statements on the usefulness and experience of each interface. These insights were analyzed to compare the performance of the two modalities and identify areas for improvement.

Key Findings and Outcomes

The results of Pilot 0C revealed both strengths and challenges in the tested interfaces. Quantitatively, the GUI Mode was significantly faster for task completion, primarily due to a shorter tutorial phase. However, other metrics—cognitive load, usability, and engagement—showed no statistically significant differences between Voxy Mode and GUI Mode. This was largely attributed to a strong learning effect inherent in the within-subjects design: participants mastered the assembly tasks in their first session, rendering the second session less informative as they were already familiar with the tasks. This methodological challenge underscored the limitations of the within-subjects approach for this study.

Qualitatively, participants expressed a clear preference for Voxy Mode, highlighting its engaging and supportive nature. Users appreciated the interactivity and novelty of voice-driven interactions with ARTA, which enhanced their sense of presence and involvement. However, they also noted limitations, including inaccuracies in ASR and ARTA’s struggles with contextual understanding in the open-ended setting, which occasionally disrupted the user experience. The GUI Mode, while efficient and functional, was perceived as less engaging and immersive. Participants also provided feedback on practical issues, such as confusing color codes, performance bottlenecks, and minor bugs in the AI models, offering valuable insights for future refinements.

These findings highlight the potential of multimodal, voice-driven interfaces in XR training. The preference for Voxy Mode suggests that voice-based interactions, when supported by robust AI, can significantly enhance engagement and perceived support in open-ended learning environments. However, the study also emphasized the need for technical improvements in ASR accuracy (when it comes to different accents) and ARTA’s ability to interpret user intent and context to ensure practical efficacy in dynamic training scenarios.

Lessons Learned and Implications

Pilot 0C provided critical insights into the role of multimodal interfaces in XR training. The user preference for Voxy Mode indicates that voice-driven interactions can foster greater engagement and support in open ended training modalities, where users benefit from the flexibility to interact with objects not directly tied to prescribed tasks and complete tasks in varied orders. This aligns with the project’s goal of promoting deeper understanding through problem-solving, contrasting with linear training systems that rely on rote memorization.

The learning effect observed in the within-subjects design was a significant methodological takeaway, leading to the decision to adopt a between-subjects design for the final Pilot. In this approach, participants will be divided into separate groups for each interface, eliminating the influence of prior task familiarity and enabling a clearer comparison of the modalities’ effectiveness.

Next Steps

Building on the outcomes of Pilot 0C, the team will focus on addressing the identified issues to prepare for the final Pilot. Key priorities include enhancing the accuracy of the ASR system and improving ARTA’s contextual awareness to ensure seamless and effective voice interactions. Bug fixes, performance optimizations, and clearer color coding will also be implemented to enhance the overall user experience. The shift to a between-subjects study design for Pilot 2 will provide a more robust evaluation of Voxy Mode and GUI Mode, offering clearer insights into their impact on user experience and training outcomes. These improvements aim to strengthen the role of multimodal, AI-driven interfaces in advancing open-ended XR training applications, bringing VOXReality closer to its goal of delivering innovative, immersive training solutions.

Picture of Leesa Joyce

Leesa Joyce

Head of Research @ Hololight

Twitter
LinkedIn
IMG_1131

AR Training: Shaping the Future of Learning

In a world where technology is constantly redefining how we work and learn, Augmented Reality (AR) is emerging as one of the most powerful tools for job training [1]. Far from being a sci-fi concept, AR is now playing a transformative role in both industrial and academic settings, making learning more immersive, safer, and more effective than ever before.

This is the era of AR Training, that blends the physical and digital worlds to support the onboarding and upskilling of students and professionals.

What Exactly Is AR Training?

AR Training leverages devices like headsets, tablets, or smartphones to overlay digital content onto the real world. This allows learners to interact with 3D models, guided instructions, and dynamic simulations while staying fully immersed in their physical environment.

According to Lee K. (2012), AR dramatically reshapes two core aspects of learning: the where and the when [2]. By delivering just-in-time visual and interactive experiences, it allows users to access the right information exactly when they need it.

Why It Works: Motivation, Safety, and Results

AR doesn’t just look futuristic, it gets results. Studies show that AR improves learner motivation, supports experiential learning, and significantly reduces error rates. For example, Bologna J.K. et al. (2020) developed an AR platform to teach operators how to calibrate HART (Highway Addressable Remote Transducer) instruments like pressure and temperature transmitters. Compared to traditional methods, 82% of users trained with AR improved both their understanding and operational safety [3].

These outcomes are particularly crucial in fields like energy, manufacturing, or medicine, where even small mistakes can lead to serious consequences. AR allows employees to gain confidence by practicing procedures in a virtual space that mimics real-life conditions, but without real-life dangers.

Costs vs. Value: Is It Worth It?

One of the main concerns around AR training is cost. Developing and deploying AR solutions can be expensive. But when you compare that investment to the high costs of traditional training, especially in hazardous sectors like firefighting, drilling, or aerospace, AR quickly begins to justify itself.

In fact, it often reduces downtime, minimizes travel and material costs, and decreases the number of repeat training sessions. For companies looking to improve the skills of their workforce while optimizing their training budgets, AR can be a strategic win.

Real-World Impact: How Industries Are Using AR

Across industries, AR is already transforming the way people learn and work. In the architecture, engineering, and construction (AEC) sectors, AR simulations are being used to train new technicians, helping them understand complex systems through immersive walkthroughs and interactive tools [4].

It’s not just about building structures; it’s also about building understanding. From technical education and maintenance guidance to safety simulations and design visualization, AR is making it easier to grasp complicated concepts, operate machinery, and avoid risks in the field. A systematic review by Tan Y. et al. (2022) documented more than 80 studies that explore the practical benefits of AR/VR technologies in both educational and professional contexts, reinforcing just how widespread and impactful these tools have become [5].

Final Thoughts

AR Training is no longer just a glimpse into the future, it’s a practical, scalable solution redefining how we teach, train, and upskill. From improving safety and engagement to delivering personalized learning experiences, AR is becoming a cornerstone of modern training strategies. While challenges remain, particularly around cost and integration, the benefits outweigh the barriers. For organizations ready to innovate, AR is more than a tool, it’s a competitive advantage.

References

Bologna, Jennifer K., et al. “An augmented reality platform for training in the industrial context.” IFAC-PapersOnLine 53.3 (2020): 197-202.

Lee, Kangdon. “Augmented reality in education and training.” TechTrends 56 (2012): 13-21.

Tan, Yi, et al. “Augmented and virtual reality (AR/VR) for education and training in the AEC industry: A systematic review of research and applications.” Buildings 12.10 (2022): 1529.

[1] Statista. “Augmented Reality (AR) – Worldwide,” https://www.statista.com/outlook/amo/ar-vr/worldwide

[2] Lee, Kangdon. “Augmented reality in education and training.” TechTrends 56 (2012): 13-21.

[3] Bologna, Jennifer K., et al. “An augmented reality platform for training in the industrial context.” IFAC-PapersOnLine 53.3 (2020): 197-202.

[4] GlobeNewswire. “Immersive Entertainment Strategic Market Report 2025–2030,” https://www.globenewswire.com/news-release/2025/04/07/3056595/28124/en/

[5] Tan, Yi, et al. “Augmented and virtual reality (AR/VR) for education and training in the AEC industry: A systematic review of research and applications.” Buildings 12.10 (2022): 1529.

Picture of Greta Ioli

Greta Ioli

Greta Ioli is an EU Project Manager in the R&D department of Maggioli Group, one of Italy's leading companies providing software and digital services for Public Administrations. After earning a degree in International Relations – European Affairs from the University of Bologna, she specialized in European projects. Greta is mainly involved in drafting project proposals and managing dissemination, communication, and exploitation activities.

Twitter
LinkedIn