Beyond the Jargon: Coordinating XR and NLP Projects Without Losing Your Headset

Extended Reality (XR) technologies provide interactive environments that facilitates immersivity by overlaying digital elements onto the real world, by allowing digital and physical objects to coexist and interact in real time or by transporting the users into a fully digital world. As XR technologies evolve, there’s an increasing demand for more natural and intuitive ways to interact with these environments. This is where Natural Language Processing (NLP) comes in. NLP technologies enable machines to understand, interpret, and respond to human language, providing the foundation for voice-controlled interactions, real-time translations, and conversational agents within XR environments. Integrating NLP technologies into XR applications is a complex process and requires a collaborative effort of experts from a wide range of areas such as artificial intelligence (AI), computer vision, human computer interaction, user experience design, and software development and domain experts from the fields the XR application is targeting. Consequently, such an effort requires a comprehensive understanding of both the scientific underpinnings and the technical requirements involved as well as targeted coordination of all stakeholders in the research and development processes.

The role of a scientific and technical coordinator is ensuring that the research aligns with the development pipeline, while also facilitating alignment between the interdisciplinary teams responsible for each component. The scientific and technical coordinator needs to ensure smooth and efficient workflow and facilitate cross-team collaboration, know-how alignment, scientific grounding of research and development, task and achievement monitoring, and communication of results. In VOXReality, Maastricht University’s Department of Advanced Computing Sciences (DACS) serves as the scientific and technical coordinator.

Our approach to scientific and technical coordination, centered on the pillars of Team, Users, Plan, Risks, Results, and Documentation, aligns closely with best practices for guiding the research, development, and integration of NLP and XR technologies. Building a Team of interdisciplinary experts, having clear communication, and ensuring common understanding among the team members fosters innovation and timely and quality delivery of results through collaboration. A focus on Users from the beginning until the end ensures that the project is driven by real-world needs, integrating feedback loops to create intuitive and engaging experiences. A detailed Plan, with well-defined milestones and achievement goals, provides structure and adaptability, ensuring the project stays on track despite challenges. Proactively addressing Risks through contingency planning and continuous performance testing mitigates potential disruptions. Tracking and analyzing Results against benchmarks ensures the project meets its objectives while delivering measurable value. Finally, robust Documentation preserves knowledge, captures lessons learned, informs the stakeholders, and paves the way for future innovation.

The VOXReality project consists of three phases that require a dedicated approach for scientific and technical coordination. Phase 1 is the “Design” phase, where the focus is on the design of the research activities as well as the challenges as defined by the usecases that they seek to address. This phase mainly focuses on the requirements which needs to be collected from different stakeholders, analyzed in terms of feasibility, scope and relevance to the project goals and mapped to functionalities in order to retrieve the technical and systemic requirements for each data-driven NLP model and the envisaged XR application. Phase 2 is the “Development” phase, where the core development of the NLP models and the XR applications takes place. In this phase, the models need to be verified and validated on a functional, technical and workflow basis and the XR applications that integrate the validated NLP models need to be tested both by the technical teams and end-users. Finally, Phase 3 is the “Maturation” phase, where the VOXReality NLP models and the XR applications are refined and improved based on the feedback received from the evaluation of Phase 2.  A thorough technical assessment of the models and applications need to be conducted before their final validation with end-users in the demonstration pilot studies. Moreover, five projects outside of the project consortium will have access to the NLP models to develop new XR applications.

Currently, we have completed the first two phases with requirement extraction completed, first versions of the NLP models released, first versions of the XR applications developed, and software tests and user pilot studies completed successfully. In Phase 1, from the scientific and technical coordination point of view, weekly meetings were organized. These meetings allowed each team involved in the project to able to familiarize themselves with the various disciplines represented and the terminologies that would be utilized throughout the project, the consolidation of viewpoints regarding the design, implementation, and execution of each use case, and the formal definition and documentation of the use cases. In Phase 2, bi-weekly technical meetings and monthly achievement meetings were conducted. These meetings allowed monitoring the advancements in the technical tasks such as developments, internal tests and experiments and achievements, especially the methodologies followed to evaluate them and potential achievements reached.

This structured approach and the coordinated effort of all team members in the VOXReality project resulted in the release of 13 NLP models, 4 datasets, a code repository that includes 6 modular components, and 3 use case applications and the publication of 4 deliverables in the form of technical reports and 6 scientific articles in journals and conferences.

Picture of Yusuf Can Semerci

Yusuf Can Semerci

Assistant Professor in Computer Vision and Human Computer Interaction at Maastricht University

Twitter
LinkedIn
Shopping Basket