VOXReality template LinkedIn v3 (2).pdf (Instagram Post) (45)

Partner Journey #3 with F6S

In this interview, Mateusz Kowacki, EU Project Manager at F6S, reflects on the pivotal role the team played in managing and delivering the VOXReality Open Call for Third Parties. From navigating tight timelines and unexpected challenges to supporting five innovative third-party projects through a full year of development, F6S brought resilience, collaboration, and a commitment to impact. Mateusz shares key lessons learned, the lasting influence of VOXReality on F6S’s future programs, and why he believes that the success of third parties is, ultimately, a shared success.

Looking back over the course of the VOXReality project, what do you consider F6S’s most significant achievement or contribution to its success?

F6S did a great job with all the preparation and implementation of the Open Call opportunities for Third Parties. I really think that we should be particularly proud of these great results achieved by TPs as we have been also part of this process. I must say, sometimes, it wasn’t easy. We have faced several challenges and issues, but my perception was that no matter what we will find a way out and, in the end, we will showcase great results. Now, I think if you ask TPs about their experience with us there will be only humble and I believe positive feedback. And just to make it more poetic I guess TPs success is our success, as we have been there for them.

What were some of the key challenges F6S faced during the open call process, and how did your team overcome them?

At the very beginning of implementation, we had some issues with the timeline. However, thanks to the motivation coming from all project partners and TPs we managed to erase this initial delay. We knew also that some things during the first review didn’t go as we planned. Solution for this was a co-creation session of what and how we do better. I must say it was teamwork all the time, without pointing fingers, putting blame etc, I was and still I’m very proud of this accomplishment 

In what ways has participating in VOXReality shaped or influenced F6S’s approach to future innovation programs or open call management?

I think that you should be ready for everything, that could be a lesson from it. While preparing the program we need to allocate some time for risk management and potential mitigation. This for sure will facilitate our work in the future Maybe, a bit joking, but we might always ask us what the worst-case scenario could be – and based on this we can shape our mitigation plan. Obviously, it’s something that we have already introduced but sometimes everyday situations might be even more surprising than this risks log excels  

What impact do you think VOXReality has had on the third-party projects you supported, and how do you envision their growth beyond the project’s conclusion?

I know, because I spoke with all of them, that the impact has been significant. They really appreciate this opportunity created by VOXReality project, and this learning curve during 12 months of implementation. Considering that these 5 TPs have been selected amongst dozens of other applications it proves already that their initial idea was great. After this implementation they are on a different level now with different challenges. I’m sure that it’s not the last time we have heard about them and that they will still finetune and develop their services and products. 

As the VOXReality project wraps up, what are your main takeaways, and what advice would you give to other organizations participating in similar EU-funded initiatives?

That only by working with ‘as a team’ mindset can we achieve great things and see great results. It’s not true that brilliant ideas are created by brilliant individuals being alone. Those great things are always the result of the exchange of thoughts, opinions, perspectives which we can only get by working in a team.  

Picture of Mateusz Kowacki

Mateusz Kowacki

EU Project Manager @ F6S

Twitter
LinkedIn
VOXReality template LinkedIn v3 (2).pdf (Instagram Post) (44)

Partner Journey #2 with Hololight

In this interview, Leesa Joyce, Head of Research at Hololight, reflects on the evolution and impact of the Virtual Training Assistant use case developed as part of the VOXReality project. Drawing from HOLO’s initial efforts in XR-enabled training, she discusses how the team transitioned from simple task-oriented guidance to a more immersive, adaptive, and hands-free learning experience tailored to real industrial environments. Through key innovations such as voice-controlled AI assistance, free-mode assembly training, and high-fidelity workspace replication, HOLO has redefined the potential of XR technologies in industrial skills development and workforce training.

Reflecting on your leadership of the Virtual Training Assistant use case, what were the most significant innovations HOLO brought to the XR training use case?

HOLO initially developed a preliminary version of an intuitive XR training application for virtual assembly tasks. This early design relied heavily on assistive cues—such as color coding, guiding lines, automatic snapping, and audio prompts—which enabled users to complete assembly procedures quickly and with minimal effort.

While this approach demonstrated the potential of XR to simplify task execution, it also revealed critical limitations for industrial training contexts. Specifically, such systems ignore the knowledge gap: assembly line workers, construction personnel, and shopfloor employees might not have previous experience with XR technology and virtual interaction. Additionally, the user may complete tasks in the virtual environment without genuinely learning or internalizing the underlying assembly logic or machinery operation. In other words, while task completion is facilitated, procedural understanding and skill transfer to real-world contexts remain limited.

Furthermore, the reliance on hand-based interactions and hand menus presents practical drawbacks in industrial scenarios. Workers’ hands are often already occupied with physical tasks, making simultaneous interaction with virtual menus cumbersome and disruptive.

To address these challenges, HOLO’s contribution within the VOXReality project, specifically in the Virtual Training Assistant use case, introduced several significant innovations:

Free-Mode Assembly Training:
Unlike the earlier stepwise, rigid training approach, HOLO implemented a free-mode assembly system. This allowed trainees to explore and attempt logical assembly sequences independently, fostering deeper understanding of the assembly process rather than mere task completion.

Voice Interaction through ASR Integration:
HOLO integrated Automatic Speech Recognition (ASR) technology developed by UM, enabling natural speech input from users. Spoken commands were transcribed into text and processed by the XR system and the AI assistant.

AI Assistant Support:
A conversational AI agent was embedded into the training workflow, providing on-demand, context-aware support based on user requests. Importantly, the agent offered assistance without unnecessary interruptions, ensuring that users retained control of the training process.

Hands-Free Interaction:
By shifting the primary mode of interaction from hand-based menus to voice-based dialogue, the usecase freed users’ hands for assembly tasks. This design improvement addressed one of the key practical barriers for XR adoption in industrial training scenarios.

Together, these innovations transformed HOLO’s training application from a task-completion aid into a more effective learning tool, capable of supporting skill acquisition, promoting logical reasoning, and better aligning XR training with real-world industrial needs.

How did you achieve the digital replication of real workspaces in your AR solution, and in what ways has this contributed to improving trainee engagement and skill retention?

We achieved the digital replication of real workspaces by leveraging HOLO’s XR streaming technology, which offloads complex rendering tasks to high-performance servers. This approach enables the visualization of highly detailed, photorealistic holograms on XR headsets without compromising performance. Furthermore, the user interface was designed with a strong emphasis on intuitive and minimalistic visual elements, ensuring that trainees remain focused on the task at hand rather than being distracted by extraneous details. The combination of high-fidelity digital environments and carefully structured interaction design enhances immersion, which in turn supports higher levels of trainee engagement, immersion and contributes to improved skill retention.

What design strategies did you implement to tailor the training experience for users with varying levels of expertise and learning speeds, and how did you develop these strategies?

The core XR training environment was designed to remain consistent across all participants, ensuring a standardized baseline of learning. However, adaptability was introduced through the integration of the AI agent, which personalizes the assistance provided. Rather than altering the training content itself, the system dynamically adjusts the level and depth of support based on user requests and demonstrated needs. For experienced users, this typically translates into fewer interactions, with a focus on advanced or technical queries. In contrast, novice users would ask basic and contextual questions and are offered more contextual explanations. This adaptive approach allows the training to accommodate varying levels of expertise and learning speeds, in contrast to traditional group-based training methods where individual differences often receive limited attention.

What future technological advancements or integrations-such as Al voice assistance or physical tool tracking-are you most excited about for enhancing the training platform?

A significant future advancement would be enhancing the AI agent with scene learning capabilities, enabling it to interpret user intentions within the training environment. Insights from our research on the Training Assistant use case suggest that such context-aware understanding would substantially improve the system’s ability to provide adaptive guidance. Integrating this capability would represent a major step forward in extending the effectiveness and intelligence of the training platform.

Leesa Joyce

Head of Research, Hololight

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (5)

Smarter Translations in VR: How Middleware Optimizes Real-Time Communication

One of the key challenges in enabling real-time multilingual communication in VR is efficiency. When many users are speaking, listening, and requesting translations at the same time, the system must manage multiple audio and text streams without overloading servers or wasting GPU resources.

In VOXReality, we solved this challenge by introducing an intermediate socket layer (middleware) between the speaker and the translation model. This middleware acts as a smart traffic controller for translations, ensuring that system resources are only used when necessary.

Here’s how it works:

  • When a user activates their microphone, a dedicated speaker socket opens (/speaker/{user-id}), making their audio stream available.
  • Listeners who want a translation connect through a listener socket (/listener/{user-id}/{language}), requesting subtitles in their preferred language.
  • Only when both a speaker and at least one listener are connected does the middleware activate a third connection to the transcription–translation model.

This design ensures that translations are processed once per speaker and once per language, no matter how many people are listening. That means hundreds of users can receive the same subtitles in real time, without duplicating the workload.

The middleware also dynamically closes connections when they are no longer needed. If a speaker turns off their microphone, their socket closes automatically. If all listeners for a language drop out, that translation stream is shut down too. This adaptive behavior keeps the system light and efficient, saving GPU cycles for when they’re really needed.

Beyond efficiency, the middleware architecture allows the system to scale naturally. Multiple speakers can be active at the same time, each with their own user ID, translation streams, and listeners, without interfering with one another.

For presentation settings such as the Conference Room, the same architecture has been adapted for one-to-many translation. In this mode, only one person—the presenter—has the microphone at any given time. When a user steps onto the virtual stage, the system automatically designates them as the active presenter and opens the standardized endpoint /speaker/presentation. Audience members simply click the translate button on their panel to join /listener/presentation/{language} in their preferred subtitle language. The middleware then activates the transcription–translation model only when both the presenter and at least one listener are connected. Subtitles appear in real time above the stage for the audience, while the presenter sees them in front of their camera view, like movie captions.

This dual architecture—many-to-many for social and business spaces, one-to-many for conference sessions—ensures seamless multilingual communication across all scenarios. The Figure shows the overall architecture, illustrating how the middleware intelligently coordinates speaker, listener, and translation model connections to deliver efficient and scalable translations in VR.

In conclusion, VOXReality’s middleware approach creates a translation system that is efficient, adaptive, and inclusive—capable of supporting everything from casual conversations to formal presentations, ensuring that every participant can engage fully in their own language.

Picture of Georgios Nikolakis

Georgios Nikolakis

Software Engineer @Synelixis

Twitter
LinkedIn
VOXReality template LinkedIn v3 (2).pdf (Instagram Post) (43)

Partner Journey #1 with Visual Computing Lab (VCL)@CERTH/ITI

In this Partner Journey spotlight, we speak with Petros Drakoulis from the Visual Computing Lab (VCL) at CERTH/ITI, one of the key research partners in VOXReality. With strong expertise in computer vision and AI, the VCL team has helped develop models that understand space and language together, making interactions in XR more natural and intuitive. Petros shares how the team approached model optimization, spatial reasoning, and long-term flexibility to support real-world XR applications

CERTH has been instrumental in exploring the intersection of CV and NLP within VOXReality. How has your team’s expertise shaped the project’s approach to integrating these technologies?

Hello Ana Rita! As we say, CERTH-ITI, and especially the Visual Computing Lab, has a well-known pedigree in Computer Vision and graphics-related AI. The difference with VOXReality was that we had to adopt the practice of synergizing with other partners, also experts in their field such as the University of Maastricht, to a much deeper technical level than we usually do – practically on the same models and agents to attack the multi-modal challenge we were tasked with. It was a great journey for both ends, I believe.

Your models include innovative spatial reasoning capabilities. Can you explain how this “spatial thinking” differentiates your approach and what benefits it brings to XR applications?

Good question. Let’s see… eXtended Reality is all about interacting in space; Virtual or Augmented, it doesn’t essentially matter. Humans, living in 3D, naturally think spatially. On the other hand, AI neural networks that reside in memory and are built on mathematical abstractions, until recently were modeled “flat, in the sense that any of their “spatially correct” outcomes, were emerging only due to apparent connections with 2D structures that acted as “vessels” for underlying, but mostly obscure, 3D information. In VOXReality we explored modalities and architectural arrangements that deeply resemble the 3rd dimension. If a model thinks in 3D, it is less likely to make mistakes in interactions that prerequisite true understanding of the world around; and that’s what we tried to bring in the project. 

What practical challenges did you encounter when training and deploying large-scale vision-language models, and how did the project address concerns related to efficiency and sustainability?

Training Large Language Models can be truly overwhelming… Especially 3 years ago, when the project started. In their greatness, transformers (the state-of-the-art underlying building blocks) really take up space! All the marvelous models we mostly rely on now, are trained in huge sums of data that were processed for an equivalently large amount of time to reach the level of performance we are now accustomed to. For us, that was a problem because firstly we wanted to develop for edge devices and secondly we do not have the capacity to really experiment on big hardware. So, from the proposal writing stage we had already identified this upcoming issue and transformed it into a project’s asset, by creating a task specifically assigned to the development of methods for model optimization. I think we succeeded, in the sense that we developed a novel method for generic post-training model compression that works, and is currently under peer review for publication. 

Looking at user interaction in XR, how do you see your models enabling more natural and intuitive communication, and what might this mean for the future of human-computer interaction?

Ah, for sure, the future of Human-Computer interaction is voice driven, with underlying models having access to multiple modalities for drawing context, including 3D, while being efficient, fast, secure, explainable and modular for edge deployment in potentially time-critical applications, like assisted-driving. Our multi-modal, spatially aware models, enhanced by our means of optimization could be considered a solid block for future research endeavors 

With the rapid evolution of XR hardware and platforms, how is CERTH ensuring that the solutions developed remain flexible and scalable across different devices and ecosystems?

From the developer’s perspective, which we are mostly entitled to opinionate, our effort is to utilize and build on tools, frameworks and data formats that are wisely chosen to sustain the test of time, even perhaps some industry “revolution”. Of course, no one has the fortuneteller’s magic 8-ball but we are quite confident that that the use of highly popular solutions and layers of abstraction, like Docker, Hugging Face, Python and PyTorch, safeguards and practically guarantees the scalability, adaptability and potential future extension, if desired, of our created content.

Picture of Petros Drakoulis

Petros Drakoulis

Research Associate, Project Manager & Software Develper at Visual Computing Lab (VCL)@CERTH/ITI

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (4)

Hippolytus (in the Arms of Aphrodite)| A conversation with the director, Yolanda Markopoulou

Hippolytus (in the Arms of Aphrodite) is an AR live performance developed for the Augmented Theatre use case of VOXReality, directed by Yolanda Markopoulou. The work experiments with AR headsets to merge stage action, visual effects, and soundscapes into a single experience where the storytelling unfolds both physically and digitally. It was produced as a 15-minute performance, engaging the artistic, dramaturgical, and technical functions of theatre rather than being treated as a technology demo.

Director Yolanda Markopoulou is an Athens-based director, producer, and creator of works that combine theatre, film, public space, and immersive media. Her projects often test how audiences move between real and imagined environments, challenging the conventional separation between performer and spectator.

Her exploration of XR began with White Dwarf (2022), a 360° VR project that placed viewers inside the Manhattan Project laboratory and showed how virtual environments can alter theatrical language. In Hippolytus (in the Arms of Aphrodite) she continued this exploration through AR, working with live and digital layers to create a unique theatre experience.

Exploring XR in Theatre – What inspires creative exploration into XR technologies in theatre?

Markopoulou explains that every XR project in theatre almost begins from zero. There are few existing models to draw on, which for her is an opportunity: “This is very positive, because you can start fresh and your imagination takes the lead.”

In traditional theatre the audience sits still, watching from one perspective. XR alters that condition: spectators remain in the same room but gain mobility, and digital environments are created around them. “They can stay in the same location, but they feel they are transported somewhere else because of the environment change.”

She describes this as working with new means of immersion: “You put the audience into different locations, into different lighting situations, sound situations, and your story develops through this kind of immersion.”

In what ways does AR differ from earlier experiments with VR, both on a creative and a technical level?

Markopoulou’s first steps into XR media were through VR using headsets, an enclosed format that placed audiences inside a virtual world. “With VR you actually close all your vision through the headset. You cannot really see the physical world,” she notes. The experience taught her the basics of presence and movement inside a virtual space, but it also highlighted how completely the headset separated the spectator from the stage.

AR altered that relationship. The Magic Leap headset is lighter and more open, keeping the physical world in view while adding digital elements on top. “You could see the physical world just with a filter, so you still felt connected to what was happening around you. It’s like you have a small screen in front of your eyes that you move around with, uncovering the world in fragments.”

AR is also unique in the way the actors themselves are able to work since they perform without being able to see the overlays themselves. “Everything had to be precisely coordinated. We relied on cues in light and sound, and with the VOXReality technologies the actors could also trigger changes in the environment around them to drive the story forward”.

What new possibilities do XR technologies open up for theatre as a medium?

For Markopoulou, XR opens new ways of working with immersion. “In XR performances the audience doesn’t just watch a story, they enter it, moving through changing environments of imagery, light and sound where the narrative unfolds.” Digital layers allow these shifts to happen inside a single performance space, giving theatre an expanded dramaturgical vocabulary.

Developing Hippolytus (in the Arms of Aphrodite) showed how AR pushes theatre into new territory, where even constraints become part of the vocabulary; a clear example of the medium as a space of exploration. “The restricted field of view encouraged audiences to move their head and body to uncover digital elements. We decided to use this feature to our advantage, turning perception itself into an active, personal process,” she notes. Sound was treated the same way: using headset audio would have been disruptive for other viewers and the actors, while headphones would restrict movement. Thus the team designed a surround system that would be free of these restrictions. “We divided the soundscape into different moments, directions, and volumes to make the experience as immersive as possible and leave the audience free to move in the space.”

Markopoulou also emphasizes accessibility as a vital aspect of XR’s potential. “I think this is a huge advantage, you don’t exclude people that don’t understand the performance’s language. With XR technology and services like those developed in VOXReality, audiences can experience theatre in multiple languages, which can make the performance much more inclusive.”

At the same time, the work on Hippolytus underlined the gap between the theatre industry and XR hardware manufacturers. “High costs and limited computing power still set boundaries, you cannot restart the performance, as you would do in a film. For XR to move further into theatre practices and unleash its full potential in the medium, technology will need to develop in closer dialogue with artists”.

What was the conceptual starting point for Hippolytus (in the Arms of Aphrodite)?

In approaching Euripides’ Hippolytus for AR, Markopoulou wanted to craft a performance that was short and experimental, while still rooted in myth. “It was a complex process where many people from different backgrounds worked together to find creative solutions for technical and artistic considerations. The headset itself set the basis for our work. Realistically we had approximately 15–20 minutes of action,” Markopoulou explains, “of course you cannot condense a full play in that time but we wanted the audience to have a full experience, not just watch an excerpt. This is how the idea of Hippolytus (in the arms of Aphrodite) was born.”

In the play, two actors, Hippolytus and Aphrodite, perform for two spectators, who witness the story at close range. “You feel very intimately involved once you enter the space and suddenly you realize that you’re in an audience of two and the action takes place right in front of you, without the usual distance between the spectator and the stage”. At times the closeness felt almost overwhelming: “People are not used to having actors playing a meter away or touching them or being close to them.”

The dramaturgy was built around this intimacy. “We started by building the action with the actors before we completed the virtual world. That way, we could see what was needed”, she recalls “Digital elements were added in rehearsal to extend or redirect what happened on stage. The overlays could conceal gestures, highlight others, or shift the audience’s attention, creating a constant dialogue between the physical and the digital, sometimes giving more space to the virtual, other times to the live action.”

In the same fashion, the development of the performance was a dialogue between artistic vision and the possibilities and limits of the technology. “We had to take many factors into account: headset performance, overheating, lighting, sound, subtitle placement, and pacing. Our goal was to maximize what was possible and give the audience as much freedom as we could, surrounding them with digital scenography that encouraged them to explore, look up, down, and around, and experience the potential of AR theatre to the fullest. It was a challenge, but we had a strong team and crucial support from the VOXReality project Maggioli Group, our technology partner, as well as from AEF.

Picture of Elena Oikonomou

Elena Oikonomou

Project manager for the Athens Epidaurus Festival, representing the organization in the HORIZON Europe VOXReality research project to advance innovative, accessible applications for cultural engagement. With a decade of experience in European initiatives, she specializes in circular economy, accessibility, innovation and skill development. She contributes a background that integrates insights from social sciences and environmental research, supporting AEF’s commitment to outreach and inclusivity. AEF, a prominent cultural institution in Greece, has hosted the renown annual Athens Epidaurus Festival for over 70 years.

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (3)

Master’s Thesis titled “Exploring User Interaction Modalities for Open-Ended Learning in XR Training Scenarios” in Collaboration with VOXReality Defended this July in Munich

Master’s Student: Gabriele Princiotta
Thesis Advisors (TUM): Dr. Sandro Weber, Prof. Dr. David Plecher
Thesis Advisor (VOXReality): Leesa Joyce

A Master’s thesis titled “Exploring User Interaction Modalities for Open-Ended Learning in XR Training Scenarios” by Gabriele Princiotta was recently defended at the Technische Universität München (TUM). The thesis was co-advised by Dr. Sandro Weber and Prof. Dr. David Plecher from TUM, and Leesa Joyce from the VOXReality consortium.

This study explores how different interaction modalities affect user experience in open-ended training environments using XR. Specifically, the research focused on an AR assembly training application developed for the Microsoft HoloLens 2. Two interaction methods were designed and compared: a traditional hand-based Graphical User Interface (GUI Mode) and an AI-powered voice interaction mode (Voxy Mode), supported by a LLM and Automatic Speech Recognition.

The user study employed a within-subjects design to evaluate the impact of these modalities on user experience, cognitive load, usability, and task engagement. While quantitative findings showed significantly faster task completion times in GUI Mode—primarily due to shorter onboarding and user familiarity—no statistically significant differences emerged across other user experience metrics. This outcome was influenced by a strong learning effect throughout the study sessions.

However, qualitative feedback indicated a clear user preference for the Voxy Mode. Participants highlighted the engaging, supportive nature of interacting with the conversational AI assistant (named ARTA), noting how it made the training feel more natural and less mechanical. At the same time, the limitations of current ASR accuracy and the assistant’s understanding of nuanced or ambiguous user input were seen as key areas for future development.

ThesisPresentation

The VOXReality partners played an essential role in enabling this research. They provided the AI voice assistant model, customized it for integration into the HoloLens application, and supported the technical setup needed for the experiment. A general assembly test provided by the consortium was used as the basis for the training scenario in the user study.

The results highlight the potential of multimodal, voice-driven interfaces in XR training environments to improve engagement and perceived support—particularly in open-ended learning tasks. At the same time, the thesis underscores the practical limitations tied to current speech recognition capabilities and the need for more sophisticated user intent recognition and contextual awareness from AI assistants in XR.

Finally, the study also draws attention to the methodological challenges posed by learning effects in within-subject comparative studies of interface designs. As XR training applications become increasingly personalized and adaptive, future research should focus on enhancing the intelligence and robustness of voice interfaces and minimizing study bias to ensure reliable UX evaluation.

Picture of Gabriele Princiotta

Gabriele Princiotta

Unity XR Developer @ Hololight

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (1)

AIXTRA: VR Training Takes a Step Forward

AIXTRA, one of the winners of the VOXReality Open Call, has successfully completed a major testing phase. From July 21st to 25th in Leipzig, 34 participants took part in both single-user and multi-user VR training scenarios. The goal was to assess the user experience and technical performance of AI-powered VR training environments. Special attention was given to the AI trainer and voice processing features, with the team gathering end-user perceptions, subjective feedback, and objective data to ensure statistically meaningful insights.

A Diverse Group of Participants

The study’s participants, ranging from 18 to 44 years old, came from various backgrounds with differing levels of familiarity with AI and XR technologies. While many had professional experience with technology, most were “not much experienced in XR.” Concerns about AI were common, with participants frequently citing “data privacy and security, inaccuracy of AI-generated answers, dependence on AI reducing human skills, ethical considerations, bias in AI decisions, and environmental impact.” Reflecting this apprehension, a significant portion of participants (41.67%) had only “neutral trust” in AI systems.

Participants’ native languages included German, English, Ukrainian-Russian, Dutch, and Vietnamese. While English proficiency ranged from beginner to fluent, many users acknowledged their accent influenced how well they were understood. “Some noted a mixed or regional accent… responses ranged from ‘never’ to ‘sometimes’ when asked if they had difficulties being understood.”

Feedback from the testing sessions offered invaluable insights into the AIXTRA system’s real-world performance. In the single-user mode, participants found the AI assistant helpful, with one stating, “it felt intuitive and safe to work with the AI assistant” and that it “helped me manage tasks better.” However, technical challenges also emerged. A common issue was the need for users to “repeat or rephrase their inputs” to be understood. One participant experienced this firsthand, recounting, “I had difficulties pronouncing the English word ‘Barometer’ correctly, and the AI only recognised it after the 4th or 5th attempt.”

In the multi-user environment, users noted translation delays and occasional inaccuracies. One participant pointed out an issue with audio quality, stating, “Poor sound quality of the assistant was difficult”, highlighting how hardware can affect clarity and immersion. Another noted a lack of comprehensive audio cues, commenting, “There wasn’t audio feedback at every intermediate step”, which could impact the guided learning process.

Standout Feedback and Future Plans

The sessions yielded several memorable quotes that captured the dual nature of AI’s impact. One user in the single-user scenario noted the “Advantage: no fear of asking the soulless AI for advice and help without blaming myself”, but immediately followed with a perceived “Disadvantage: less independent thinking and reflection”. revealing a broader concern about “Over-Reliance and Complacency”. In the multi-user setting, a participant, despite pointing out a critical error, enthusiastically concluded, “Otherwise it was pretty cool”.

Following this successful testing phase, the AIXTRA project is now moving into its final stages. The team will use a structured user study, approved by an ethics committee, to combine user and developer feedback for deeper insights. The project will also make its demo application available as Open Access and has two more publications planned to increase public visibility. A scientific paper is also in the works to “evaluate the results and show future trends in the field of AI and XR environment”.

Twitter
LinkedIn
VOXReality template LinkedIn v3 (3).pdf (1920 x 1080 px) (LinkedIn Post) (1)

Weld-E @ EEITE 2025 and AIOTI

WELD-E, one of the winners of the VOXReality Open Call, is a project that’s pioneering the future of human-robot collaboration in the welding industry. By integrating artificial intelligence (AI) with extended reality (XR), WELD-E has created a safer, more efficient, and more intuitive welding environment. The project’s team recently presented two key papers that detail the system’s advancements.

Published Papers

The WELD-E team has published two papers outlining their work. Each paper details a different aspect of the project’s technology and its implications for the future of manufacturing.

Paper 1: “WELD-E: Enhanced-XR Human-Robot Interactive Collaboration for Welding Operations “

  • Lead Authors: Andreas El Saer, George Tsakiris, Leonidas Valavanis, Aristea M. Zafeiropoulou, Konstantinos Loupos, George Argyropoulos, Petros Tsampiras
  • Publication: EEITE 2025, 6th International Conference In Electronic Engineering & Information Technology
  • Abstract: This paper introduces WEld-e, an end-to-end solution for human-robot collaboration in welding. It leverages AI and XR technologies, including Microsoft HoloLens, to address challenges like a lack of effective guidance and real-time monitoring. The system uses a multimodal interface with voice commands, spatial awareness, and automated decision-making. At its core, it employs deep learning models—such as Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), a domain-specific Welding Large Language Model (WeLLM), and a Visual Language Model—to execute welding commands with high precision and safety. The framework supports multilingual, context-aware interactions and improves operational efficiency and safety in hazardous industrial settings. The paper validates that WEld-e can significantly reduce setup times, training requirements, and error rates, aligning with Industry 5.0 objectives.
  • Key Breakthrough: By combining AI and Extended Reality, welders can interact naturally with robots using voice and gestures. This makes welding operations faster and more intuitive by reducing errors and improving safety, all without the need for complex programming.
1749215427278
Posters for the paper presented in EEITE 2025

Paper 2: “Edge AI IoT Immersive Applications”

  • Lead Authors: Andreas El Saer, George Tsakiris, Leonidas Valavanis, Aristea M. Zafeiropoulou, Konstantinos Loupos, George Argyropoulos, Petros Tsampiras
  • Publication: AIOTI (The European Alliance for Innovation)
  • Abstract: This paper presents the WEld-e system as an immersive, AI-driven platform for human-robot collaboration in robotic welding. It integrates Mixed Reality (MR) via Microsoft HoloLens, along with advanced voice and gesture control, and a suite of AI models (ASR, NMT, TTS, and WeLLM). The system enables real-time, multilingual communication between human experts and robotic welders. It also features a digital twin interface for spatially contextual feedback and safety monitoring of a UR10e robotic arm. The paper highlights a key innovation in its knowledge distillation pipeline for computer vision, which refines object detection. Built on a modular architecture using ROS2 and Unity3D, the system supports low-latency interactions, improves precision, and enhances operator awareness, aligning with Industry 5.0 goals.
  • Key Breakthrough: How an expert can remotely guide a welding robot using only spoken commands and hand gestures, all while seeing a real-time view through special glasses. This breakthrough simplifies complex tasks, making them faster and safer, and allows experts to supervise operations from a distance.

A Real-World Example

Imagine a welder in a factory wearing HoloLens glasses. Instead of manually programming a robot or using a complex control panel, they simply say, in their native language, “Start welding using template 3.” The system, powered by the AI described in the papers, understands the voice command, translates it if necessary, and directs the robotic arm to begin welding immediately.

But the system’s intelligence doesn’t stop there. Safety is a top priority. If another worker accidentally walks into the designated welding area, the system’s sensors instantly detect the person. It then stops the robotic arm, and a visual and auditory alert pops up in the welder’s glasses, saying something like, “Unauthorized person detected. Operation canceled.” This real-time, intuitive interaction not only streamlines the workflow but also creates a significantly safer working environment. 

The WELD-E project is a powerful example of how cutting-edge technology can transform traditional industries.

Twitter
LinkedIn
VAARHeT Usability Testing at Araisi - Welcome Avatar 3

VAARHeT Project Transforms Visitor Experience at Āraiši Ezerpils Archaeological Park

The VAARHeT project, one of the winners of the VOXReality Open Call and a pioneering initiative in enhancing cultural heritage experiences through technology, recently conducted successful usability testing of three innovative solutions at the Āraiši Ezerpils Archaeological Park in Cēsis, Latvia. From July 14th to 16th, 2025, visitors had the opportunity to engage with XR applications, providing feedback that will shape the future of museum interaction.

Diverse Technologies for a Richer Experience
“The study consisted of a usability test and user experience study of three AI-assisted voice-driven interactive XR applications for open-air museum visitors”, explains Cordula Hansen, a representative from XR Ireland, one of the project partners. These included:

  • VR Glasses for Storytelling: These immersive glasses transport visitors back in time, narrating and visually reconstructing the captivating history of the lake castle.
  • AR Glasses for Visual Translation: A different type of wearable technology, these glasses offer real-time visual translation for diverse language groups within a single tour, fostering inclusivity.
  • Mobile App Avatar for Practical Information: A friendly blue-haired avatar within a mobile application provides immediate answers to common visitor queries, such as opening hours and ticket prices.

Visitors at the Heart of the Innovation
The VAARHeT museum partner recruited 39 test participants, specifically targeting adults aged 25 to 55 with an interest in culture and museums, often visiting with children aged 3 to 12. This demographic aligns with the project’s design persona, ensuring feedback from the intended user base.

The primary objectives of involving these groups were multifaceted: to assess the overall usability of the applications, evaluate the added value of voice-driven XR experiences utilizing VOXReality’s NLP components and AI models, and identify areas for refinement with a view towards commercialization.

Overwhelming Positive Feedback for VR Storytelling
Initial data analysis, even while comprehensive processing is ongoing, reveals a clear favorite among the tested technologies: the virtual guide glasses.

David Toulon, an XR Ireland representative, shared, “Based on the feedback received so far, the virtual reality glasses are very popular. You can see and hear the story of how the castle was built. Visitor feedback suggests that the virtual reality glasses are here to stay.” [1]

This sentiment was echoed by test participants. Juris Beņķis, a visitor and former tourist guide, enthusiastically stated, “I’ve worked as a tourist guide, I know how hard it is to get children interested. I imagine that 20 children will come and put on these glasses, then it will be wow, they will be fascinated and will learn something.” [1]

The head of the archaeological park, Eva Koljera, highlighted the broader impact of the VR glasses, noting their utility beyond just engaging younger audiences: “People with mobility impairments who cannot physically get to the lake castle could get an idea of the castle here. Of course, young people and children also like it.” [1]

Cordula Hansen recounted how numerous visitors reacted with awe, with many exclaiming, “Wow, it really feels like you’re there!” Testers consistently praised the detailed graphical representations and the accurate explanations of how the historical houses were constructed. The novelty of voice interaction was also a popular feature, with one participant commenting, “I liked that there were no buttons to press.”

Continuous Improvement and Future Steps
The testing phase was also an opportunity for refinement. Before the main usability tests at Āraiši, preliminary performance tests were conducted in both lab settings and at the museum to ensure minimal latency and enable parallel testing of all three pilots.

One significant improvement was made to the Welcome Avatar, where the source material for the RAG component was re-curated to enhance the quality of responses. Cordula Hansen also noted an interesting user behavior: “due to the novelty of the voice interaction, first test users reported confusion when faced with the challenge of ‘speaking to the machine.’ We solved this through improved UX writing and adding some examples and tutorial prompts to the user flows to ‘practice’ with.”

Looking ahead, the project team has clear next steps. “After finalising the analysis of our test results, we will be able to determine which of the three VAARHeT pilots brings the most immediate value to the museum, and what type of interaction mechanics would be the most appropriate for that use case,” explained Cordula Hansen.

The plan is to further refine the most promising pilot experience and install it at the museum for an extended period, allowing for continuous feedback gathering in a real-world operational environment. Additionally, the voice-activation components trialed in these pilots will be further tested and integrated into an XR content management system platform specifically designed for the cultural heritage sector.

The VAARHeT project’s initial piloting has demonstrated the immense potential of XR technologies to create more engaging, accessible, and informative museum experiences. With the valuable feedback gathered, the future of cultural heritage interaction looks brighter and more immersive than ever!

[1] Part of the information in this article was sourced from the TV3.lv article “Virtuālais asistents, tulkošanas brilles un avatars: Āraišos notiek nākotnes muzeju izmēģinājumi” (Virtual assistant, translation glasses and avatar: future museums are being tested in Āraiši), available at https://tv3.lv/dzivesstils/celotprieks/virtualais-asistents-tulkosanas-brilles-un-avatars-araisos-notiek-nakotnes-muzeju-izmeginajumi/.

Photos courtesy of XR Ireland.

Twitter
LinkedIn
1750744402057

XR-CareerAssist @ SalentoXR 2025

XR-CareerAssist, one of the Open Call winners of VOXReality, was presented at the International Conference on Extended Reality, Salento XR 2025, held in Otranto, Italy (June 17–20, 2025). The project uses VR and AI to make career planning dynamic, personalized, and engaging.

The paper, “Transforming Career Development Through Immersive and Data-Driven Solutions,” focuses on XR-CareerAssist to address the limitations of traditional career counseling, often seen as boring, inflexible, and hard to access.

The system uses VR goggles and AI to create an immersive environment. Users interact with a 3D avatar that understands multiple languages, visualize career paths with interactive maps, and get personalized advice from a database of over 100,000 real career profiles. It also features voice commands for natural interaction.

The paper details the system’s design, built on a vast database of CVs, and describes how AI models work together. Future plans include testing with 25-40 real users.

Real-World Impact: Sarah’s Journey with XR-CareerAssist

Consider Sarah, a 35-year-old IT manager in the UK with 10 years of experience, aiming to become a Chief Officer. XR-CareerAssist helps her directly:

  • Immersive Start: Sarah puts on a MetaQuest 3 headset and enters a virtual environment with a 3D avatar.
  • Voice Interaction: She simply states her current role, experience, and Chief Officer aspiration.
  • Instant Analysis: The system quickly finds 1500 similar profiles, showing that such professionals typically work across 3 sectors and that reaching Chief Officer takes about 15 years.
  • Visual Career Map: Sarah sees an interactive Sankey diagram (flowchart) displaying various paths from Manager to Chief Officer over the next 10 years, including industry detours.
  • Multilingual Support: The system automatically translates everything into French if Sarah prefers, due to her work with a French company.
  • Personalized Insights: The AI-powered avatar explains visualizations, points out necessary skills, and highlights high-success career moves. Sarah can actively ask questions via speech.

This example shows how XR-CareerAssist offers data-backed, visual, and highly personalized guidance, empowering users to make informed career decisions.

2_Using Augmented Reality and Machine Learning for Captioning in Theatrical Experiences #183-poster for SalentoXR - V1_page-0001
Poster for the paper presented in SalentoXR 2025

Abstract: The rapid evolution of technology has created opportunities to transform traditional career guidance methods into dynamic, immersive, and data-driven solutions. XR-CareerAssist, is an innovative platform, that aims to provide career insights and enhance user engagement by integrating Extended Reality (XR) and Artificial Intelligence (AI) technologies. A dedicated tool is built and presented, which analyses over 100,000 anonymised professional profiles. This tool is a key-component of XR-CareerAssist and is used to visualise career trajectories, industry trends, and skill pathways through interactive and immersive experiences. Features such as virtual reality (VR) environments, voice-based navigation, multilingual support, and AI-driven 3D avatars empower users to explore career paths dynamically and intuitively. By merging robust data analytics with immersive visualizations, XR-CareerAssist not only boosts user engagement but also improves accessibility and aids in the clear interpretation of career trajectories. This study explores the envisioned scenarios, highlights results from initial testing with the CV Analysis tool, and examines how XR-CareerAssist enhances career guidance and training, fostering personalised and impactful career development in a globalised job market.  

Keywords: Career Guidance, Career Maps, Artificial Intelligence, LLMs, Virtual Reality

Full Article: “Transforming Career Development Through Immersive and Data-Driven Solutions” by N.D. Tantaroudas (Institute of Communication and Computer Systems, Greece), A. J. McCracken (DASKALOS-APPS, France), I. Karachalios (National Technical University of Athens, Greece), E. Papatheou (University of Exeter, UK), V. Pastrikakis (CVCOSMOS Ltd, UK)

Twitter
LinkedIn