Deep Learning-Based Real-Time Data Quality Assessment and Anomaly Detection for Large-Scale Distributed Data Streams
Abstract
Time delay and data quality degradation pose significant challenges in large-scale distributed data streams processing. This paper proposes a deep learning-based real-time data quality assessment and anomaly detection method for distributed streaming data environments. The proposed approach integrates quality-aware feature extraction with adaptive deep neural networks to enable real-time quality monitoring and anomaly detection. A multi-dimensional quality assessment framework is developed, incorporating temporal-spatial correlations and stream characteristics for comprehensive quality evaluation. The system implements a distributed architecture with parallel processing capabilities, enabling scalable operations across multiple nodes while maintaining low-latency responses. A novel online learning mechanism is introduced to adapt model parameters dynamically, ensuring robust performance under evolving data patterns. Experimental evaluation conducted on three large-scale datasets, including industrial IoT sensors (2.5TB), network traffic (1.8TB), and financial transactions (3.2TB), demonstrates superior performance compared to traditional methods. The system achieves 97.8% detection accuracy while maintaining processing latency below 10ms, with linear scalability up to 128 nodes. Results show consistent performance improvement across different operational scenarios, with 95% precision in anomaly detection and throughput exceeding 1.2 million events per second.
How to Cite This Article
Hanqing Zhang, Xuzhong Jia, Chen Chen (2025).
Deep Learning-Based Real-Time Data Quality Assessment and Anomaly Detection for Large-Scale Distributed Data Streams
. International Journal of Medical and All Body Health Research (IJMABHR), 6(1), 01-11. DOI: https://doi.org/10.54660/IJMBHR.2025.6.1.01-11