Data Engineer
A research-backed roadmap from SQL foundations to lakehouse architecture — dbt, Spark, Kafka, Airflow, Delta Lake, and cloud platforms in the dependency order that 2026 data teams are hiring for.
SQL & Python Foundations
3–4 weeksThe universal tools of data engineering — advanced SQL for warehouse-scale transformations and Python for pipeline automation, data validation, and cloud API interaction.
Data Warehousing & ELT
4–5 weeksMaster the platforms where modern analytical data lives — Snowflake, BigQuery, and Databricks — and the ELT patterns that make pipelines reliable at scale.
Modern dbt & Data Quality
3–4 weeksdbt is the transformation standard of the modern data stack — bring software engineering to SQL and build the data quality layer that makes pipelines trustworthy.
Pipeline Orchestration
3–4 weeksOrchestrate complex multi-step data workflows with Airflow and Dagster — scheduling, dependency management, backfilling, and production observability.
Big Data with Spark & Databricks
5–6 weeksProcess terabytes reliably — Spark fundamentals, performance optimisation, and the Delta Lake / Apache Iceberg open table formats that define the 2026 lakehouse.
Streaming (Kafka & Flink)
4–5 weeksBuild real-time data pipelines — Kafka for durable event streaming, Flink for stateful stream processing, and the architecture patterns that make streaming reliable.
DataOps, Cloud & Lakehouse
3–4 weeksInfrastructure as code, cloud cost control, data governance, and the production practices that make a data platform trustworthy and maintainable at scale.
AI Data Infrastructure & Portfolio
2–3 weeksBuild the feature pipelines and vector infrastructure that power ML systems, then build a portfolio that demonstrates production data engineering thinking.
Ready to start your path?
SQL and Python are in 95%+ of data engineer job postings — start with the foundations.