Obfuscation of Sensitive Text in Audiovisual Content Using AI
Abstract
The digital revolution has led to an increase in audiovisual content across platforms, creating new challenges for privacy protection. Sensitive information, such as personal identifiers, financial data or contact information, frequently appears in images and videos, often unintentionally. These accidental disclosures can lead to serious privacy breaches or misuse of personal data. To address this issue, we present an automated solution for detecting and obscuring sensitive text in multimedia content, with particular focus on Spanish-language educational materials. Our system combines Microsoft Presidio’s advanced Natural Language Processing (NLP) capabilities for Personally Identifiable Information (PII) detection with Tesseract Optical Character Recognition (OCR) text extraction from visual media. Detected sensitive content is then obfuscated using advanced image processing techniques, ensuring privacy protection while maintaining the visual quality of the multimedia. This integrated approach provides an effective, efficient method for protecting personal data in multimedia applications without compromising usability.
Cite
Text
Jiang-Chen and Ferri. "Obfuscation of Sensitive Text in Audiovisual Content Using AI." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06129-4_40Markdown
[Jiang-Chen and Ferri. "Obfuscation of Sensitive Text in Audiovisual Content Using AI." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/jiangchen2025ecmlpkdd-obfuscation/) doi:10.1007/978-3-032-06129-4_40BibTeX
@inproceedings{jiangchen2025ecmlpkdd-obfuscation,
title = {{Obfuscation of Sensitive Text in Audiovisual Content Using AI}},
author = {Jiang-Chen, Kexin and Ferri, Cèsar},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2025},
pages = {506-510},
doi = {10.1007/978-3-032-06129-4_40},
url = {https://mlanthology.org/ecmlpkdd/2025/jiangchen2025ecmlpkdd-obfuscation/}
}