Developing NLP models in the age of AI race

The AI rance intensifies

During the last 10-15 years, Natural Language Processing (NLP) has undergone a profound transformation, driven by advancements in deep learning, use of massive datasets and increased computational power. These innovations led to early breakthroughs such as word embeddings (Word2Vec [1], GloVe [2]) and paved the way for advanced architectures like sequence-to-sequence models and attention mechanisms, all based on neural architectures. It was in 2018, that the introduction of transformers and especially BERT [3]  (as an open-source model) that enabled the contextualized understanding of language. Performance in NLP tasks like machine translation, sentiment analysis or speech recognition has been significantly boosted, making AI-driven language technologies more accurate and scalable than ever before.

The “AI race” has intensified with the rise of large language models (LLMs) like OpenAI’s ChatGPT [4] and DeepSeek-R1 [5], which use huge architectures with billions of parameters and massive multilingual datasets to push the boundaries of NLP. These models dominate fields like conversational AI and can perform a wide range of tasks by achieving human-like fluency and context awareness. Companies and research institutions worldwide are competing to build more powerful, efficient, and aligned AI systems, leading to a rapid cycle of innovation. However, this race also raises challenges related to interpretability, ethical AI deployment and the accessibility of high-performing models beyond large tech firms.

But what did DeepSeek achieve? In early 2025, DeepSeek released its R1 model, which has been noted to outperform many state-of-the-art LLMs at a lower cost, therefore it caused a disruption in the AI sector. DeepSeek had made its R1 model available on platforms like Azure, allowing users to take advantage of their technology. DeepSeek introduced many technical innovations that allowed their model to thrive (such as architecture innovations: hybrid transformer design, the use of mixture-of-experts models, auxiliary-loss-free load balancing) however their main contribution was the reduction of reliance on traditional labeled datasets. This innovation stems from the integration of pure reinforcement learning techniques (RL), enabling the model to learn complex reasoning tasks without the need for extensive labeled data. This approach not only reduces the dependency on large labeled datasets but also streamlines the training process, lowering the resource requirements and costs associated with developing advanced AI models.

Figure: DeepSeek architecture (taken from https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1)
Figure: DeepSeek architecture (taken from https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1)

The Enduring Relevance of models used in VOXreality

At VOXReality we take a fundamentally different approach and believe in the significant value brought by “traditional” AI models (esp. for ASR and MT) particularly in specialized domain applications. We prioritize real open-source AI by ensuring transparency, reproducibility and accessibility [6]. Unlike proprietary or restricted “open weight” models, our work is built upon truly open architectures that allow full modification and deployment without any limitations. This is the reason that our open call winners [7] are allowed to build on top of the VOXreality ecosystem. Moreover, our approaches often require less computational power and data, making them suitable for scenarios with limited resources or where deploying large-scale AI models is impractical. Our models can be tailored to specific industries or fields, incorporating domain specific expertise without extensive or expensive retraining. The implementation of models on a local scale (if chosen), can also offer enhanced control over data and compliance with privacy regulations, which can be a significant consideration in sensitive domains. 

VoxReality’s Strategic Integration

At VoxReality, we strategically integrate traditional ASR and MT approaches to complement advanced AI models, ensuring a comprehensive and adaptable solution that leverages the strengths of state-of-the art AI models. This focus on real open-source innovation and data-driven performance differentiates VOXreality from the rapidly evolving landance of AI mega-models.

Picture of Jerry Spanakis

Jerry Spanakis

Assistant Professor in Data Mining & Machine Learning at Maastricht University

References

[1] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.

[2] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics

[4] https://openai.com/chatgpt/overview/

[5] https://github.com/deepseek-ai/DeepSeek-R1

[6] https://huggingface.co/voxreality

[7] https://voxreality.eu/open-call-winners/

Twitter
LinkedIn
Shopping Basket