Infrastructureml platformdatabricksamazon sagemakermlops

Zalando Documents Machine Learning Platform Architecture

|May 24, 2026|By LDS Team

6.6

Relevance Score

Zalando Documents Machine Learning Platform Architecture — Photo: engineering.zalando.com · rights & takedowns

Per Zalando's engineering blog, the company runs a multi-stage machine learning platform that starts with a hosted experimentation environment called Datalab (a browser-accessible, JupyterHub-based workspace) and moves to large-scale processing on Databricks for Apache Spark workloads. The platform integrates data sources such as S3, BigQuery, and MicroStrategy, according to the blog. An AWS Machine Learning blog coauthored with Zalando describes production inference work on Amazon SageMaker and details a forecast-then-optimize approach for markdown pricing; that post also notes Zalando serves about 50 million active customers. Additional vendor writeups (Databricks, ZenML) describe a unified data foundation using Databricks Unity Catalog and orchestration components such as ZFlow and AWS Step Functions.

What happened

Per Zalando's engineering blog, Zalando operates a multi-stage machine learning platform built to support experimentation and large-scale data processing. The engineering post describes a hosted experimentation environment called Datalab, which is a browser-accessible, JupyterHub-based workspace that includes R Studio, shell access, and preinstalled data-science libraries and connectors to internal data sources such as S3, BigQuery, and MicroStrategy. The same engineering blog reports that Zalando leverages Databricks for Apache Spark workloads when experiments require large-scale processing.

An AWS Machine Learning blog coauthored with Zalando documents how Zalando optimized large-scale inference and ML operations on Amazon SageMaker and explains the company's markdown-pricing solution, described as a forecast-then-optimize pipeline; that AWS post also states Zalando serves approximately 50 million active customers. Vendor writeups add supporting architecture detail: a Databricks blog describes a unified data foundation built using Databricks Unity Catalog and a federated access-control layer, while a ZenML summary outlines an ML lifecycle architecture that includes ZFlow and AWS Step Functions for orchestration.

Technical details

Per the Zalando engineering blog, the platform separates exploratory work from large-scale processing: Datalab supports notebooks and ad-hoc analysis, while Databricks handles Spark-based scale and long-running jobs. The AWS coauthored post describes production inference on Amazon SageMaker, including optimizations for serving large numbers of SKUs and integrating discount-steering forecasts into downstream pricing optimization. The Databricks material documents a central data governance layer implemented with Databricks Unity Catalog and a federated access model on top of that catalog.

Editorial analysis - technical context

Companies building ML platforms at multi-million-customer scale commonly separate fast experimentation from reliable batch and serving infrastructure; this reduces friction for data scientists while limiting blast radius of experimental code. Industry-pattern observations: teams typically pair notebook-first environments with managed Spark or distributed compute and add a data-governance catalog to centralize schema, lineage, and access controls. Orchestration via workflow engines like AWS Step Functions and pipeline frameworks such as ZFlow mirrors broader MLOps practices for reproducibility and operational reliability.

Context and significance

Zalando's documentation illustrates a pragmatic, multi-vendor ML stack that combines developer-friendly experimentation, managed big-data compute, and cloud-native serving. The combination of a governed data catalog, Spark-on-Databricks processing, and managed inference on Amazon SageMaker reflects an architecture pattern that many large retailers and consumer platforms adopt to balance agility, governance, and cost. For practitioners, the concrete examples-hosted JupyterHub, federated catalog, forecast-then-optimize for pricing-provide reusable reference patterns for ML lifecycle design.

What to watch

Observers should track additions to the data-governance layer (for example, extensions to Databricks Unity Catalog or federated access controls) and signals about how inference workloads are scaled on Amazon SageMaker. Industry observers will also watch adaptations of the forecast-then-optimize pattern-documented in the AWS post and cited research-for other retail tasks such as inventory or promotion planning.

Key Points

1Zalando documents a multi-stage ML stack: Datalab for notebooks, Databricks for Spark scale, and Amazon SageMaker for production inference.
2The company uses a forecast-then-optimize pipeline for markdown pricing, enabling price recommendations informed by demand, returns, and fulfillment forecasts.
3Industry-pattern observation: combining notebook experimentation, a governed data catalog, and managed serving mirrors reproducible MLOps practices for large-scale retail platforms.

Scoring Rationale

This is a notable, practitioner-relevant architecture case study showing a mature, multi-vendor ML platform at scale. The content is practical for ML engineers designing lifecycle workflows but does not introduce new research or a paradigm shift. Source age reduces immediacy.

MoreMachine Learning news

Sources

Primary source and supporting public references used for this report.

4 sources

Primary sourceengineering.zalando.comZalando's Machine Learning Platform

View 3 more sources

Practice with real Retail & eCommerce data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Prime/Platinum Customer SegmentsEasy

High-Value Orders Above $5KMedium

Return Rate by SellerHard

250 free problems · No credit card

See all Retail & eCommerce problems