DeltaTrace Delivers Traceable Health Platform

Researchers present DeltaTrace, a unified open-source big data health platform (JMIR Med Inform 2025) that embeds end-to-end data and model traceability using Delta Lake, Kafka, Spark, Airflow, MLflow, and Grafana. Evaluated with LifeSnaps wearable and questionnaire data, it processes continuous streams for roughly 1,500 users with end-to-end delays below 10 minutes and maintains consistent performance on CPU-only servers. The platform supports reproducible, auditable monitoring for preventive care in aging populations.
Key Points
- 1Implements DeltaTrace using Delta Lake, Kafka, Spark, Airflow, MLflow for end-to-end traceable data and models
- 2Demonstrates sub-10-minute end-to-end processing for ~1,500 users, ensuring scalable real-time monitoring
- 3Enables reproducible, auditable pipelines and model versioning suitable for preventive care and regulatory compliance
Scoring Rationale
Practical, evaluated traceability platform with performance metrics; strength in reproducibility but limited by prototype scope and specific hardware.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

