A retail company forecasts sales at three levels. The CFO wants a national number for next year's budget. Regional directors need territory-level projections for staffing. Store managers need product-level predictions to manage inventory. Each team builds its own model, and the numbers never agree.
Hierarchical time series forecasting exists to solve exactly this problem. It takes independently generated predictions at every level of a business hierarchy and mathematically reconciles them so the numbers add up from the shelf to the boardroom. The sum of store forecasts matches the regional totals, and the regional totals match the national figure. No rounding hacks, no manual adjustments.
This matters more than it sounds. When your supply chain orders (driven by store-level forecasts) conflict with your financial budget (driven by the top-level forecast), real money gets wasted. Time series forecasting already requires careful handling of trend and seasonality. Hierarchical forecasting adds one more constraint: coherency across aggregation levels.
We will use one running example throughout: RetailCo, a fictional chain with 1 national total, 2 regions (East and West), and 4 stores (E1, E2, W1, W2). Every formula, every code block, and every diagram ties back to this hierarchy.
Click to expandRetailCo hierarchy flowing from National total down to East and West regions, then to four individual stores
The Structure Behind Hierarchical Time Series
A hierarchical time series is a collection of time series arranged in a tree, where lower-level series aggregate upward to form higher-level series. The root represents the total (RetailCo National Sales), branches represent mid-level groupings (East Region, West Region), and leaves represent the most granular series (Stores E1, E2, W1, W2).
The defining rule is strict: at any time , a parent must equal the sum of its children. National sales = East + West. East = E1 + E2. There's no wiggle room.
Grouped vs. Strict Hierarchies
Two flavors exist. In a strict hierarchy, every node has exactly one parent. Geography is the textbook case: Store E1 belongs only to the East Region. The tree is rigid.
In a grouped hierarchy, attributes cross-cut. You might slice sales by region and by product category. The total can be disaggregated as National -> East -> Electronics or as National -> Electronics -> East. The tree structure isn't unique, but the aggregation constraints still hold.
Modern reconciliation methods (including MinT, covered below) handle both structures identically, as long as the summing matrix is built correctly.
| Feature | Strict Hierarchy | Grouped Hierarchy |
|---|---|---|
| Parent per node | Exactly one | Multiple paths |
| Classic example | Geography (country -> state -> city) | Geography x Product category |
| Tree structure | Unique | Non-unique |
| Reconciliation math | Same | Same |
Temporal Hierarchies
There's a third type worth knowing. Instead of slicing by geography, you can slice by time granularity: annual -> quarterly -> monthly -> weekly. A company's annual forecast should equal the sum of 12 monthly forecasts. The same reconciliation framework applies, and you can even combine cross-sectional and temporal hierarchies simultaneously.
The Summing Matrix That Encodes Your Hierarchy
The summing matrix is the mathematical backbone of hierarchical forecasting. It encodes the entire tree structure as a matrix of ones and zeros, mapping bottom-level series to every level of the hierarchy.
For RetailCo with 4 bottom-level stores and 7 total series (National, East, West, E1, E2, W1, W2), the system is expressed as:
Where:
- is a vector of all 7 series at time (National, East, West, E1, E2, W1, W2)
- is a vector of the 4 bottom-level series at time (E1, E2, W1, W2)
- is a $7 \times 4$ matrix of ones and zeros defining the aggregation rules
In Plain English: The summing matrix is RetailCo's organizational chart turned into math. It tells the algorithm: "To get the East Region number, add Store E1 and Store E2. To get the National number, add all four stores." Every row of is a recipe for computing one series from the bottom-level ingredients.
Click to expandThe summing matrix S maps four bottom-level store series to all seven hierarchical levels for RetailCo
Concretely, for RetailCo:
Row 1 sums all four stores to get the National total. Rows 2 and 3 sum regional pairs. Rows 4 through 7 are the identity for each store. Let's verify this with code:
Expected Output:
Summing Matrix S (7x4):
[[1 1 1 1]
[1 1 0 0]
[0 0 1 1]
[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]]
National (month 1): 362.3
East + West (month 1): 362.3
Coherent: True
Historical data is always coherent because the aggregation is exact arithmetic. The trouble starts when you forecast.
Independent Forecasts Break Coherency
Incoherency happens when you build a separate model for each series without enforcing the additive constraint. Each model minimizes its own error in isolation, and the resulting predictions almost never sum correctly across levels. If you fit an ARIMA to each of RetailCo's 7 series independently:
This gap isn't a bug in your model. Different models see different patterns, use different parameters, and produce predictions that live in their own worlds. When the CFO's budget number disagrees with the sum of what regional directors are planning for, someone has to manually fudge the numbers. That's the problem reconciliation solves.
The goal is to transform incoherent base forecasts into coherent reconciled forecasts that satisfy . The question is how.
Expected Output:
=== Incoherency in Base Forecasts ===
National forecast: 468.23
Sum of store forecasts: 477.82
Gap (National): -9.59
East forecast: 247.56
Sum of E1 + E2: 237.14
Gap (East): 10.43
A gap of $10-$20 might look small for one month. Multiply that across thousands of SKUs, hundreds of stores, and a full fiscal year, and the cumulative incoherency can reach millions in misallocated resources.
Bottom-Up Reconciliation
Bottom-up reconciliation forecasts only the leaf nodes and aggregates upward. You build models for the 4 stores, discard any higher-level forecasts, and compute the regional and national numbers by summing.
Where:
- is the reconciled (coherent) forecast vector for all 7 series
- is the summing matrix
- is the vector of base forecasts for the 4 bottom-level stores only
In Plain English: For RetailCo, you throw away the independent national and regional forecasts entirely. Instead, take each store's prediction, add E1 + E2 to get East, add W1 + W2 to get West, and sum all four to get National. The numbers add up by definition.
Strengths. Nothing is lost. If Store W1 is surging while W2 is declining, bottom-up captures that divergence. Coherency is guaranteed by construction.
Weaknesses. Bottom-level data is noisy. Forecasting weekly sales for a single store is much harder than forecasting regional aggregates, where randomness partially cancels out. You're betting everything on your noisiest series.
Key Insight: Bottom-up works best when your leaf-level series have strong, stable patterns. If they're sparse or highly volatile (think daily sales of a niche product at a single location), the noise compounds as you aggregate.
Top-Down Reconciliation
Top-down reconciliation does the opposite. You forecast only the root (National) and split it downward using historical proportions.
If Store E1 historically accounts for 15% of national sales:
Where:
- is the reconciled forecast for Store E1
- is the historical proportion of national sales attributable to Store E1
- is the base forecast for the national total
In Plain English: For RetailCo, if Store E1 has historically contributed 28% of national sales, top-down assigns it 28% of whatever the national model predicts. Simple, stable, and smooth. One model does all the heavy lifting.
Strengths. The national-level series is smooth and easy to forecast. You build one model instead of many. The law of large numbers works in your favor.
Weaknesses. Historical proportions assume the past distribution of sales holds in the future. If a new competitor opens near Store E2, its share will drop, but top-down won't notice until months of data catch up.
Common Pitfall: Top-down breaks silently when the hierarchy itself changes. A new store opening, an old store closing, or a seasonal shift in regional mix all invalidate the fixed proportions. If your business is growing unevenly, avoid top-down.
Optimal Reconciliation with MinT
Optimal reconciliation, formalized by Hyndman et al. (2011) and refined into the MinT (Minimum Trace) method by Wickramasuriya, Athanasopoulos, and Hyndman (2019), takes a fundamentally different approach. Instead of choosing between bottom-up and top-down, it uses forecasts from all levels and combines them optimally.
The reconciled forecast is:
Where:
- is the vector of reconciled (coherent) forecasts for all 7 series
- is the summing matrix ($7 \times 4$)
- is the reconciliation matrix ($4 \times 7$) that maps all base forecasts to optimal bottom-level forecasts
- is the vector of incoherent base forecasts for all 7 series
The MinT algorithm computes by minimizing the trace of the forecast error covariance matrix:
Where:
- is the covariance matrix of the base forecast errors
- is the transpose of the summing matrix
- is the inverse of , which assigns higher weight to more reliable forecasts
In Plain English: MinT is a weighted vote across RetailCo's seven forecasts. It looks at how accurate each model has historically been. If the National-level model has large errors, it gets less influence. If Store E1's forecasts are consistently precise, they get more weight. The algorithm finds the exact adjustment to every forecast that makes the numbers add up and minimizes total prediction error across the entire hierarchy.
This is why MinT often produces forecasts that are more accurate than any individual level's base forecast. It borrows strength across levels. A clean national trend can correct noise in a store forecast, and a strong store-level signal can sharpen a vague national estimate.
Click to expandComparison of bottom-up, top-down, and optimal MinT reconciliation approaches for hierarchical forecasting
| Method | Forecasts Built | Coherency | Handles Noise | Captures Local Trends | Typical Accuracy |
|---|---|---|---|---|---|
| Bottom-Up | Leaf only | By construction | Poor (amplifies) | Excellent | Variable |
| Top-Down | Root only | By construction | Excellent (smooths) | Poor (fixed splits) | Variable |
| MinT | All levels | After reconciliation | Optimal (weighted) | Optimal (weighted) | Usually best |
Implementing All Three Methods with NumPy
Since hierarchicalforecast and scikit-hts aren't available in browser-based Python environments, let's implement the core math manually. This is instructive: you'll see exactly what each reconciliation method does under the hood.
Expected Output:
=== Month 19 Forecast Comparison ===
Series Actual Base BotUp TopDn MinT
----------------------------------------------------
National 439.4 440.5 440.5 440.5 440.5
East 217.3 216.6 216.6 219.7 216.6
West 222.1 223.8 223.8 220.8 223.8
E1 118.9 118.6 118.6 122.5 118.6
E2 98.3 98.0 98.0 97.2 98.0
W1 147.7 149.5 149.5 147.3 149.5
W2 74.4 74.3 74.3 73.4 74.3
=== Coherency Check ===
Base National=440.5 Sum(stores)=440.5 Gap=-0.0000
Bottom-Up National=440.5 Sum(stores)=440.5 Gap=0.0000
Top-Down National=440.5 Sum(stores)=440.5 Gap=0.0000
MinT National=440.5 Sum(stores)=440.5 Gap=0.0000
All three reconciliation methods produce zero-gap coherent forecasts. The base forecasts do not. MinT's numbers fall between bottom-up and top-down because it's weighting both sources of information.
Handling the Covariance Matrix W
The practical challenge with MinT is estimating , the forecast error covariance matrix. For RetailCo with 7 series, is $7 \times 7W$ could be millions of entries.
Three common approximations, in order of complexity:
- Identity (OLS): Set . All series weighted equally. Fast but ignores error differences.
- Diagonal (WLS): Set . Each series weighted by its own variance. Ignores cross-correlations but usually a big improvement.
- Shrinkage (MinT-shrink): Estimate the full covariance matrix with Ledoit-Wolf shrinkage. Best accuracy, handles the "more series than observations" problem gracefully.
Pro Tip: Start with MinTrace(method='mint_shrink') in production. The shrinkage estimator handles covariance estimation better than the full sample covariance when you have many series or short history. It's the default recommendation from the original authors.
Using HierarchicalForecast in Production
For production workloads, the HierarchicalForecast library from Nixtla handles the full pipeline: building the summing matrix, generating base forecasts with StatsForecast, and applying reconciliation.
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, TopDown, MinTrace
from hierarchicalforecast.utils import aggregate
# Define the hierarchy
hierarchy_levels = [['Region'], ['Region', 'Store']]
Y_df, S_df, tags = aggregate(df, hierarchy_levels)
# Generate base forecasts for every series
sf = StatsForecast(models=[AutoARIMA(season_length=12)], freq='M')
Y_hat_df = sf.forecast(df=Y_df, h=6)
# Reconcile with all three methods
reconcilers = [
BottomUp(),
TopDown(method='forecast_proportions'),
MinTrace(method='mint_shrink')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(
Y_hat_df=Y_hat_df, Y_df=Y_df, S=S_df, tags=tags
)
The output DataFrame contains columns like AutoARIMA/BottomUp, AutoARIMA/TopDown, and AutoARIMA/MinTrace_mint_shrink, letting you compare reconciliation methods side by side before picking one for deployment.
When to Use Hierarchical Reconciliation (and When Not To)
Use it when:
- Multiple stakeholders consume forecasts at different aggregation levels
- Operational decisions (inventory, staffing) must align with financial planning
- You have natural tree structures in your data (geography, product taxonomy, time granularity)
- Base forecast accuracy varies significantly across levels
- You need probabilistic forecasts that are coherent (prediction intervals that add up)
Skip it when:
- You only need forecasts at a single level (no hierarchy to reconcile)
- Your hierarchy is flat with very few series (the overhead isn't worth it)
- All series are truly independent with no aggregation relationship
- You lack sufficient history to estimate the covariance matrix for MinT
Key Insight: Reconciliation isn't just about making numbers consistent. Empirical studies consistently show that MinT reconciled forecasts are more accurate on average than the base forecasts at any single level. You're getting both coherency and a free accuracy boost.
Production Considerations
Computational complexity. MinT reconciliation itself is cheap: it's a matrix multiplication after solving a linear system. The bottleneck is estimating and computing . With a diagonal , this is where is the number of bottom-level series and is the total series count. With the full covariance and thousands of series, the shrinkage estimator is essential.
Scaling behavior. Large retailers routinely reconcile hierarchies with 50,000+ bottom-level series. The summing matrix is sparse (mostly zeros), so sparse linear algebra libraries handle this efficiently. Nixtla's library already exploits sparsity internally.
Forecast horizon. Reconciliation is applied independently at each forecast step. If you're predicting 12 months ahead using multi-step strategies, reconcile each of the 12 steps separately.
Updating the hierarchy. When the hierarchy changes (new store, closed store), rebuild and re-estimate . The reconciliation formula doesn't care about history; it only needs the current structure and current error estimates.
Conclusion
Hierarchical time series reconciliation turns a collection of disconnected forecasts into a coherent system where every number adds up. Bottom-up preserves granular signals but amplifies noise. Top-down smooths noise but misses local trends. MinT optimal reconciliation combines both, weighting each level by its reliability, and the result is typically more accurate than any single approach.
The core math is surprisingly compact. The summing matrix encodes your hierarchy, the covariance matrix captures forecast reliability, and one matrix equation produces reconciled forecasts. If your business has any kind of aggregation structure, this should be part of your forecasting pipeline.
For the base models feeding into reconciliation, explore ARIMA for stationary patterns, Exponential Smoothing for level and trend, and Prophet for series with complex seasonality and holidays. The choice of base model matters, but reconciliation ensures that whatever models you choose, the final numbers tell a consistent story.
Interview Questions
Q: What is forecast incoherency and why does it matter in practice?
Incoherency means the forecasts at different levels of a hierarchy don't add up. If you sum the store-level predictions, you get a different number than the direct company-level prediction. This creates operational conflict: the supply chain team orders based on one set of numbers while finance budgets from another. Reconciliation eliminates this mismatch and often improves accuracy as a side effect.
Q: How does MinT differ from simple bottom-up or top-down methods?
MinT uses information from every level of the hierarchy instead of discarding either the top or the bottom forecasts. It weights each series inversely by its forecast error variance, so accurate models get more influence. The result is coherent and generally more accurate than either single-direction approach. The main requirement is enough historical data to estimate the error covariance matrix.
Q: Explain the role of the summing matrix S in hierarchical forecasting.
The summing matrix maps bottom-level forecasts to every level of the hierarchy. Each row contains ones and zeros indicating which bottom-level series contribute to that aggregate. Multiplying by the bottom-level vector produces all series, coherent by construction. It encodes the "organizational chart" of the data as a linear algebra constraint.
Q: When would you choose bottom-up over MinT?
Bottom-up is the right call when you have zero historical forecast errors to estimate from (cold-start scenarios), or when the bottom-level series are well-behaved with strong patterns and low noise. It's also simpler to explain to stakeholders, which matters in regulated industries. If your store-level forecasts are already accurate, the incremental gain from MinT may not justify the added complexity.
Q: How do you handle a new store that just opened in a hierarchical forecasting system?
Top-down breaks because it relies on historical proportions, and a new store has none. Bottom-up works if you can build a forecast for the new store using proxy data or a cold-start model. MinT also handles it by including the new node's (admittedly uncertain) forecast with high variance, which naturally down-weights it in the reconciliation. You rebuild to include the new node and re-estimate .
Q: What is the difference between hierarchical and grouped time series?
In a strict hierarchy, each node has exactly one parent (a store belongs to one region). In a grouped structure, series can be aggregated along multiple dimensions (geography and product category) where the same leaf may appear under different grouping paths. The reconciliation math is identical; you just build a larger summing matrix that encodes all valid aggregation relationships.
Q: How does MinT scale to hierarchies with tens of thousands of bottom-level series?
The bottleneck is inverting , which costs where is the number of bottom-level series. For large hierarchies, approximate with its diagonal (variance scaling) or use Ledoit-Wolf shrinkage. The summing matrix is highly sparse, so sparse matrix operations keep memory and computation manageable.
Hands-On Practice
The 'Incoherency Problem' in time series forecasting, where global forecasts don't match the sum of their parts. Using the Retail Sales dataset, we will construct a natural hierarchy by aggregating daily sales into Weekly Total, Weekday, and Weekend components. You will generate independent base forecasts for each level, observe the mathematical mismatch, and apply the Bottom-Up approach to enforce coherency, ensuring your numbers add up perfectly across the business structure.
Dataset: Retail Sales (Time Series) 3 years of daily retail sales data with clear trend, weekly/yearly seasonality, and related features. Includes sales, visitors, marketing spend, and temperature. Perfect for ARIMA, Exponential Smoothing, and Time Series Forecasting.
You have successfully demonstrated that independent forecasts rarely sum up correctly and applied the Bottom-Up method to enforce mathematical consistency. Try experimenting with the 'Top-Down' approach by calculating the average historical proportion of Weekday/Weekend sales and distributing the Total Forecast downwards. You can also deepen the hierarchy by splitting the data further (e.g., Total -> Month -> Week) to see how error propagation changes with more levels.