Abstract
Fast data processing, data integrity, and distributed system fault tolerance need checkpointing in real-time data engineering pipelines. In-memory processing, financial services, high-performance computing, social media analytics, and cars may benefit. The main advantages are data integrity, scalability, fault tolerance, and simulation efficiency. Performance overheads, distributed system complexity, storage needs, and severe environment endurance are challenges. Asynchronous checkpointing, machine learning, storage, standardisation, and application-specific architectures are goals. This extensive study shows how checkpointing enhances real-time data pipeline reliability and efficiency.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2020 North American Journal of Engineering Research