Enforcing Vector Space Stability in Embeddings with Evolving Vocabularies: A Web-App Behavior Perspective
Abstract
Embeddings generated from navigation data unlock valuable insights and provide strong baselines for a wide range of applications. However, the dynamic and evolving nature of financial applications presents significant challenges for the stability and adaptability of embedding models, particularly when these embeddings are used as inputs to downstream analytical models. In this paper, an alternative approach for a real-world constraints environment is proposed to address constantly changing vocabularies and dependencies across downstream systems. In order to ensure seamless integration of new elements into an established vector space, and using data from BBVA’s application as a case study, a methodology is developed by combining Word2Vec-based embeddings with a two-step pipeline: Embedding Matcher and Space Mirroring. The former is an alignment mechanism that assigns new pages to existing embeddings using Levenshtein distance and cosine similarity. The latter is a technique for embedding projection into the original vector space in which multiple transformation techniques, including SVD, dense layers, ResNet, and GRU-based models, have been compared. The results obtained highlight the effectiveness of preserving semantic integrity and reducing the impact of updates on downstream models while minimizing computational overhead. The proposed approach is applicable to any context that involves dynamic vocabulary data.
Cite
Text
Rodríguez-González et al. "Enforcing Vector Space Stability in Embeddings with Evolving Vocabularies: A Web-App Behavior Perspective." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-662-72243-5_31Markdown
[Rodríguez-González et al. "Enforcing Vector Space Stability in Embeddings with Evolving Vocabularies: A Web-App Behavior Perspective." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/rodriguezgonzalez2025ecmlpkdd-enforcing/) doi:10.1007/978-3-662-72243-5_31BibTeX
@inproceedings{rodriguezgonzalez2025ecmlpkdd-enforcing,
title = {{Enforcing Vector Space Stability in Embeddings with Evolving Vocabularies: A Web-App Behavior Perspective}},
author = {Rodríguez-González, Asier and Serrano, Ignacio Sisamón and Conca, Ignacio Esplugues and Santolaya, Daniel Sánchez and Sánchez, Joel Medina},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2025},
pages = {543-558},
doi = {10.1007/978-3-662-72243-5_31},
url = {https://mlanthology.org/ecmlpkdd/2025/rodriguezgonzalez2025ecmlpkdd-enforcing/}
}