Every real forecasting problem is a multi-step problem. An energy trader doesn't need tomorrow's electricity price; she needs the next 24 hourly prices to optimize her bids. A retailer planning warehouse shipments needs demand projections for the next 14 days, not just the next one. Hospital administrators scheduling nurses need patient volume forecasts across an entire week. The moment your forecast horizon stretches beyond a single step, the strategy you choose for generating that sequence of predictions matters more than which algorithm sits inside it.

Multi-step time series forecasting is the task of predicting multiple future values from a sequence of historical observations. There are four core strategies (recursive, direct, multi-output (MIMO), and DirRec hybrid) and each handles the propagation of uncertainty across the forecast window in fundamentally different ways. We'll build all four strategies on one running example: predicting daily electricity demand (in megawatts) for the next 7 days using 3 years of historical data.

Multi-step forecasting strategy decision flowchart Click to expandMulti-step forecasting strategy decision flowchart

The multi-step forecasting problem

Multi-step forecasting extends single-step prediction to produce an entire vector of future values across a defined horizon. In single-step forecasting, a model $f$ maps historical observations to one future value:

$\hat{y}_{t+1} = f(y_t, y_{t-1}, \dots, y_{t-n})$

Where:

$\hat{y}_{t+1}$ is the predicted value one step ahead
$y_t, y_{t-1}, \dots, y_{t-n}$ are the $n$ most recent observed values
$f$ is the learned forecasting function

Multi-step forecasting extends this to a full horizon $H$ . We need the entire vector:

$[\hat{y}_{t+1}, \hat{y}_{t+2}, \dots, \hat{y}_{t+H}]$

In Plain English: For our electricity example, $H = 7$ means predicting Monday through Sunday demand given everything up to and including today. Instead of answering "what's tomorrow's demand?", you're answering "what does the entire next week look like?"

The difficulty is that these future values aren't independent. Wednesday's demand depends on Tuesday's. A heatwave that starts Tuesday will still be driving air conditioning load on Thursday. Any forecasting strategy must decide how to handle these inter-step dependencies.

There are four primary strategies:

Strategy	Models Trained	Core Idea
Recursive	1	Predict one step, feed it back, repeat
Direct	$H$	Train a separate specialist for each horizon step
Multi-Output (MIMO)	1	One model outputs the entire forecast vector at once
DirRec Hybrid	$H$ (chained)	Each specialist sees prior specialists' predictions

Pro Tip: Before tackling multi-step forecasting, make sure your series is stationary or properly differenced. Unaddressed trends and seasonality compound badly over longer horizons. Our guide on Time Series Forecasting: Mastering Trends, Seasonality, and Stationarity covers the preprocessing essentials.

Recursive strategy: one model, iterated forward

The recursive strategy (also called the iterative or autoregressive strategy) is the most intuitive approach. You train a single one-step-ahead model, then feed each prediction back as input for the next step. It's how most people first think about multi-step forecasting, and it remains the default in many production systems due to its simplicity.

How recursive forecasting works

Train model $f$ to predict $y_{t+1}$ from the last $n$ observed values.
Generate $\hat{y}_{t+1}$ .
Slide the input window forward: drop the oldest value, append $\hat{y}_{t+1}$ .
Use the updated window to predict $\hat{y}_{t+2}$ .
Repeat until you reach horizon $H$ .

For a 7-day electricity forecast, the chain looks like:

$\hat{y}_{t+1} = f(y_t, y_{t-1}, \dots, y_{t-n})$

$\hat{y}_{t+2} = f(\hat{y}_{t+1}, y_t, \dots, y_{t-n+1})$

$\vdots$

$\hat{y}_{t+H} = f(\hat{y}_{t+H-1}, \hat{y}_{t+H-2}, \dots)$

Where:

$f$ is the same model at every step (only one model is trained)
$\hat{y}_{t+k}$ is the predicted value at step $k$ , used as input for step $k+1$
By step $H$ , the input window contains mostly predictions rather than real observations

In Plain English: You make a one-day forecast, pretend that forecast is real, and use it to make the next one-day forecast. By day 7, your input window is a tower of cards, mostly your own predictions stacked on top of each other.

Recursive forecasting error accumulation across 7 steps Click to expandRecursive forecasting error accumulation across 7 steps

Python implementation

Expected output:

code

Recursive 7-day RMSE: 2.45
  Day 1: predicted=250.4, actual=250.1, error=0.3
  Day 2: predicted=255.0, actual=249.8, error=5.2
  Day 3: predicted=252.0, actual=252.4, error=0.4
  Day 4: predicted=243.3, actual=240.9, error=2.4
  Day 5: predicted=236.1, actual=234.0, error=2.1
  Day 6: predicted=242.1, actual=241.2, error=0.9
  Day 7: predicted=249.4, actual=247.6, error=1.8

With this 7-day window, the recursive forecast stays reasonably close. The key risk is that each prediction feeds into the next as input, so errors can compound, an effect that becomes more pronounced at longer horizons (e.g., 30 or 90 days ahead).

Error accumulation: the recursive strategy's defining weakness

The critical property of recursive forecasting is that prediction errors compound. If your day-1 forecast is off by $\epsilon$ , the day-2 forecast starts from a slightly wrong position:

$\hat{y}_{t+2} = f(\hat{y}_{t+1} + \epsilon, \; y_t, \dots)$

By day 7, the input window is contaminated with six rounds of accumulated error. This isn't theoretical hand-waving. It's the dominant failure mode in practice. Recursive forecasts of stationary series often collapse to the mean within 10-20 steps, producing the dreaded "flat line" forecast that's useless for planning.

The bias-variance tradeoff here is instructive. Recursive models tend toward low variance (one model, trained on the full dataset) but increasing bias as the horizon grows. The model was trained on real observations but at inference time receives its own noisy predictions, a distribution shift it wasn't optimized for. Ben Taieb and Hyndman showed in their 2014 ICML paper "Boosting multi-step autoregressive forecasts" that this bias grows roughly proportional to the number of recursive steps, modulated by the Jacobian of the model function.

Key Insight: Recursive forecasting works well for short horizons (1-3 steps) on well-behaved series. Once you need 7+ steps, seriously consider the direct or multi-output alternatives.

Direct strategy: one specialist per horizon step

The direct strategy eliminates error accumulation entirely by training a separate, independent model for each step in the horizon. Need a 7-day electricity demand forecast? Train 7 models. Model $f_3$ only answers the question "what will demand be in exactly 3 days?" and it never sees the output of $f_1$ or $f_2$ .

How direct forecasting works

$\hat{y}_{t+h} = f_h(y_t, y_{t-1}, \dots, y_{t-n}), \quad h = 1, 2, \dots, H$

Where:

$f_h$ is a model trained specifically for horizon step $h$
Each model receives the same historical input window
No model's output is ever fed as input to another model

In Plain English: Instead of one generalist that iterates forward, you hire seven specialists. The "Friday demand" specialist has been trained exclusively on the relationship between a 14-day history window and the demand exactly 5 days later. It doesn't know or care what happened on Monday through Thursday.

Python implementation

Scikit-learn's MultiOutputRegressor (as of version 1.8) wraps this pattern cleanly. It fits one independent regressor per target column, which is exactly the direct strategy.

Expected output:

code

Direct 7-day RMSE: 3.83
  Day 1: predicted=251.1, actual=251.8, error=0.7
  Day 2: predicted=238.1, actual=246.8, error=8.6
  Day 3: predicted=236.6, actual=236.5, error=0.1
  Day 4: predicted=237.8, actual=237.0, error=0.8
  Day 5: predicted=250.0, actual=252.2, error=2.2
  Day 6: predicted=251.3, actual=250.1, error=1.2
  Day 7: predicted=254.4, actual=249.8, error=4.6

Tradeoffs of the direct approach

Strengths:

Zero error accumulation. The day-7 prediction is just as "fresh" as day-1 because both are made directly from real observations.
Each model can be tuned independently. Maybe day-1 needs a shallow tree while day-7 benefits from a deeper one.

Weaknesses:

Computational cost. Training $H$ separate models means $H$ times the training time and memory. For a 24-hour-ahead forecast at 15-minute resolution ($H = 96$), you're training 96 models.
No dependency modeling. Model $f_5$ has no idea what $f_4$ predicted. If Tuesday's demand was abnormally high, model $f_5$ (predicting Wednesday) can't react to that signal because it only sees the original historical window.
Higher variance for distant horizons. Model $f_7$ tries to map a 14-day-old window directly to a value 7 days later. That's a harder regression problem with less signal, so it tends to produce noisier predictions.

Common Pitfall: Don't confuse scikit-learn's MultiOutputRegressor (which trains independent models) with a single model that natively supports multiple outputs. They look similar in the API but behave very differently under the hood.

Multi-output (MIMO) strategy: one model, full vector output

The multi-output strategy, often called MIMO (Multiple-Input Multiple-Output), uses a single model that outputs the entire forecast horizon as a vector in one forward pass. Unlike the direct strategy where each target is learned independently, a MIMO model shares internal parameters across all horizon steps, letting it learn correlations between them.

The math

$[\hat{y}_{t+1}, \hat{y}_{t+2}, \dots, \hat{y}_{t+H}] = f(y_t, y_{t-1}, \dots, y_{t-n})$

Where:

$f$ is a single model with $H$ output dimensions
All outputs share the same internal representation
The model can learn that if Monday demand is high, the rest of the week probably follows

In Plain English: Instead of seven specialists who never talk (direct) or one generalist who keeps guessing (recursive), MIMO is one model that produces a complete 7-day forecast in a single pass. It learns the shape of demand curves, not just individual points.

Where MIMO shines

This strategy is natural for neural networks. An LSTM or Transformer encoder processes the input sequence and a decoder (or a final dense layer with $H$ neurons) outputs the full horizon vector. The shared hidden representations let the model capture temporal structure across the forecast window. For more on LSTM architectures in time series, see our guide on Mastering LSTMs for Time Series.

But MIMO isn't restricted to deep learning. Scikit-learn's KNeighborsRegressor natively supports multi-output targets: it finds the $k$ nearest historical windows and averages their corresponding future vectors. Vector Autoregression (VAR) models are also inherently multi-output.

Expected output:

code

MIMO (KNN) 7-day RMSE: 4.23
  Day 1: predicted=249.2, actual=251.8
  Day 2: predicted=240.5, actual=246.8
  Day 3: predicted=235.9, actual=236.5
  Day 4: predicted=238.2, actual=237.0
  Day 5: predicted=246.0, actual=252.2
  Day 6: predicted=252.7, actual=250.1
  Day 7: predicted=255.5, actual=249.8

The key advantage over the direct strategy is coherent forecasts. Because one model produces the entire curve, the predictions tend to look like a plausible demand trajectory rather than seven disconnected guesses. Taieb et al.'s extensive comparison on the NN5 forecasting competition (111 time series, multiple strategies) found that multi-output strategies were among the best-performing approaches, especially when combined with deseasonalization.

The downside: the model must learn a complex mapping to $H$ outputs simultaneously. With limited training data or very long horizons, this can lead to underfitting because the output space is so large.

DirRec hybrid strategy: chained specialists

The DirRec (Direct-Recursive) hybrid was proposed by Taieb and Hyndman to get the best of both worlds: the no-error-accumulation property of direct models with the dependency-awareness of recursive chaining.

How DirRec works

Train model $f_1$ to predict $y_{t+1}$ from the historical window, identical to direct.
Train model $f_2$ to predict $y_{t+2}$ from the historical window plus $\hat{y}_{t+1}$ (the output of $f_1$ ).
Train model $f_3$ to predict $y_{t+3}$ from the historical window plus $\hat{y}_{t+1}$ and $\hat{y}_{t+2}$ .
Continue through $f_H$ .

$\hat{y}_{t+1} = f_1(y_t, y_{t-1}, \dots, y_{t-n})$

$\hat{y}_{t+2} = f_2(y_t, y_{t-1}, \dots, y_{t-n}, \; \hat{y}_{t+1})$

$\hat{y}_{t+h} = f_h(y_t, \dots, y_{t-n}, \; \hat{y}_{t+1}, \dots, \hat{y}_{t+h-1})$

Where:

$f_h$ is a separate model trained specifically for step $h$
Each model receives the original historical window (unchanged) plus all prior predictions
The original window serves as a fallback if earlier predictions are noisy

In Plain English: Each model in the chain is a specialist (like direct), but each specialist also gets to see what the previous specialists predicted (like recursive). Model $f_5$ predicting Friday's electricity demand gets the original 14-day history plus the predictions for Monday through Thursday.

Python implementation with RegressorChain

Scikit-learn's RegressorChain (available since version 0.21, stable in 1.8) implements exactly this pattern. Each regressor in the chain receives the original features augmented with predictions of all prior regressors.

Expected output:

code

DirRec (RegressorChain) 7-day RMSE: 3.69
  Day 1: predicted=251.1, actual=251.8
  Day 2: predicted=239.0, actual=246.8
  Day 3: predicted=237.9, actual=236.5
  Day 4: predicted=237.2, actual=237.0
  Day 5: predicted=250.0, actual=252.2
  Day 6: predicted=251.7, actual=250.1
  Day 7: predicted=254.9, actual=249.8

When DirRec helps (and when it doesn't)

DirRec solves the direct strategy's biggest weakness: later models can react to earlier predictions. If $f_1$ predicts an unusually high Monday, $f_2$ through $f_7$ can adjust accordingly.

However, DirRec isn't immune to error propagation. If $f_1$ produces a bad prediction, every downstream model receives that bad prediction as an input feature. The difference from pure recursive is that each downstream model also has the full original historical window as a fallback, so it isn't forced to rely on the earlier predictions the way a recursive model is. In practice, this means DirRec propagates errors more slowly than recursive but isn't completely free of them like direct.

The computational cost is similar to the direct strategy ( $H$ models), plus the overhead of sequential chaining during both training and inference.

Comparing all four strategies head-to-head

Here's a side-by-side comparison across the dimensions that matter most in production:

Property	Recursive	Direct	Multi-Output (MIMO)	DirRec
Models trained	1	$H$	1	$H$ (chained)
Error behavior	Accumulates (growing bias)	Independent (higher variance)	Balanced	Moderate accumulation
Inter-step dependencies	Captured via feedback	Ignored	Captured via shared weights	Captured via chaining
Training cost	$O(N)$	$O(H \cdot N)$	$O(N)$ to $O(N \cdot H)$	$O(H \cdot N)$
Inference cost	Sequential ( $H$ passes)	Parallel (1 pass each)	Single forward pass	Sequential ( $H$ passes)
Forecast coherence	High (temporal continuity)	Low (disconnected points)	High (shared representation)	Medium-High
Best for	Short horizons, stable series	Long horizons, ample data	Neural nets, structured output	Medium horizons, strong dependencies

Key Insight: The 2025 Stratify framework (Green et al., Data Mining and Knowledge Discovery) showed that in 84% of 1,080 experiments across 18 benchmark datasets, hybrid strategies outperformed any single fixed approach. There's no universal winner; the right strategy depends on your data.

Comparison of all four multi-step forecasting strategies showing error and dependency characteristics Click to expandComparison of all four multi-step forecasting strategies showing error and dependency characteristics

When to use each strategy (and when NOT to)

Choosing the right strategy is a design decision, not a hyperparameter. Here's a decision framework grounded in the electricity demand example:

Recursive: use for short horizons on stable series

Use when:

Your horizon is short (1-5 steps)
The series has strong autocorrelation and stable patterns
You want maximum simplicity and minimal training cost
You're building a real-time system where model retraining happens frequently

Do NOT use when:

Your horizon exceeds 10 steps, where error accumulation will destroy accuracy
The series has regime changes or structural breaks
You need each forecast step to be independently calibrated

Direct: use for long horizons with sufficient data

Use when:

Your horizon is long (10+ steps) and accuracy at each step matters equally
You have enough training data for each model to learn well
Computational budget allows $H$ models (training and serving)
Steps are relatively independent (weak serial correlation in residuals)

Do NOT use when:

Adjacent forecast steps are highly correlated and you need smooth trajectories
$H$ is very large (100+ steps) and you can't afford that many models
You need the forecast to react to its own earlier predictions

MIMO: use for deep learning and structured outputs

Use when:

You're using neural networks (LSTM, Transformer, temporal CNN)
The forecast horizon has strong internal structure (weekly patterns, ramp-up/ramp-down shapes)
You need coherent, smooth forecast curves
Training data is abundant enough for the model to learn the joint distribution

Do NOT use when:

$H$ is very large and the model underfits the high-dimensional output
You're working with small datasets where shared parameters lead to underfitting
You need per-step interpretability (it's harder to explain why step 5 was predicted as X)

DirRec: use for medium horizons with strong dependencies

Use when:

Your horizon is moderate (5-30 steps)
Consecutive time steps are strongly dependent (demand, temperature, traffic)
You want the direct strategy's stability plus some dependency modeling
You can afford the sequential training/inference pipeline

Do NOT use when:

Error in early steps is unreliable (it'll contaminate downstream models)
You need parallel inference for latency-critical applications
The series shows weak serial correlation

Common pitfalls in multi-step forecasting

The flat-line forecast

This is the most common failure for beginners using the recursive strategy. After a few steps, the forecast converges to a straight line at the mean of the training data.

Why it happens: Stationary models are designed to revert to the mean. That's what stationarity means. If the model can't "see" enough of the seasonal cycle in its window (a 7-day window on data with a 365-day seasonal cycle), the statistically safest prediction for any distant future step is the long-run average.

The fix: Either increase the window size so the model captures at least one full cycle of the dominant seasonality, or switch to a direct strategy that doesn't iterate. Adding explicit calendar features (day-of-week, month, holiday flags) also helps because they carry seasonal information without requiring a huge window.

Data leakage in direct strategy targets

When building the multi-target matrix for the direct strategy, an off-by-one error can leak future information into the training features. The rule is simple: for model $f_h$ predicting $y_{t+h}$ , the most recent observation allowed in the input is $y_t$ . Not $y_{t+1}$ , not $y_{t+h-1}$ , just $y_t$ .

A quick sanity check: for every row in your training set, verify that the timestamp of the latest lag feature is strictly before the timestamp of the earliest target.

Warning: Off-by-one leakage is especially sneaky because it makes your validation scores look great while producing overconfident, unreliable forecasts in production.

Wrong cross-validation scheme

Standard k-fold cross-validation shuffles data randomly, which destroys the temporal ordering that time series depend on. Even scikit-learn's TimeSeriesSplit can be insufficient for multi-step forecasting if you don't account for the forecast gap.

If your horizon is 7 days, your validation fold must have at least a 7-day gap between the last training observation and the first validation target. Without this gap, your model trains on Monday-Friday data and validates on predictions for the same week, data that's heavily autocorrelated with the training set.

Expected output:

code

TimeSeriesSplit with gap=7 configured
  n_splits: 5
  gap: 7

Ignoring exogenous features for distant horizons

In real electricity forecasting, you don't just feed past demand values. You feed weather forecasts, holiday calendars, day-of-week indicators, and planned maintenance schedules. For direct and multi-output models, these exogenous features for the future period are known in advance (weather forecasts exist for the next 7 days; the calendar is fixed). Use them.

A direct model for Friday that knows "Friday is a public holiday" will drastically outperform one that only sees last week's demand numbers. This is where feature engineering becomes as important as the forecasting strategy itself.

Production considerations

Computational scaling

Strategy	Training Time	Memory	Inference Latency
Recursive	$O(N)$ for 1 model	Low (1 model in memory)	$O(H)$ sequential passes
Direct	$O(H \cdot N)$ for $H$ models	High ( $H$ models in memory)	$O(1)$ per model, parallelizable
MIMO	$O(N \cdot H)$ for 1 model	Medium (1 larger model)	$O(1)$ single pass
DirRec	$O(H \cdot N)$ for $H$ models	High ( $H$ models, sequential)	$O(H)$ sequential passes

For a production electricity grid forecasting system running at 15-minute resolution with $H = 96$ (24 hours ahead), the direct strategy means maintaining 96 XGBoost models. That's roughly 96 x 50 MB = 4.8 GB of model artifacts. If you're retraining daily, that's 96 training jobs. MIMO with a single Transformer model might use 500 MB and one training job, a compelling operational advantage.

Model monitoring across the horizon

With multi-step forecasts, you need to track accuracy per horizon step, not just aggregate RMSE. A model that's great at step 1 but terrible at step 7 looks "average" in aggregate metrics, masking a serious problem. Set up dashboards that show RMSE (or MAPE) broken down by horizon step, and set alerts per-step.

Retraining frequency

Recursive models benefit most from frequent retraining because their error accumulation is sensitive to distribution drift. If your data's statistical properties shift over time (and they will; electricity demand patterns changed significantly during COVID), a recursive model trained months ago will accumulate errors much faster than a freshly trained one.

Direct models are more resilient to staleness because each model independently maps features to a specific horizon, but the distant-horizon models ( $f_7$ , $f_{14}$ ) still degrade faster because their signal-to-noise ratio is lower to begin with.

Conclusion

Multi-step forecasting is a structural design decision, not just a modeling problem. The four strategies (recursive, direct, MIMO, and DirRec) represent fundamentally different answers to how uncertainty should propagate across your forecast window. Recursive is simple and data-efficient but accumulates errors that grow with the horizon. Direct eliminates error propagation at the cost of training $H$ independent models that can't communicate. MIMO produces coherent forecast curves through shared parameters, making it the natural choice for neural network architectures. DirRec splits the difference, giving each specialist access to prior predictions without full dependency.

Start with a recursive baseline: it takes ten minutes to implement and immediately shows you how fast errors accumulate on your specific data. If the forecast degrades beyond your tolerance before reaching the full horizon, graduate to direct or DirRec. For deep learning pipelines, default to MIMO. And regardless of strategy, always validate with a proper time series split that includes a gap equal to your forecast horizon.

For the foundational concepts behind the models used in these strategies, see our guides on Mastering ARIMA: The Mathematical Engine of Time Series Forecasting and Unlocking Exponential Smoothing: From Simple Averages to Holt-Winters. If your multi-step forecasts keep collapsing to a flat line, revisit Time Series Forecasting: Mastering Trends, Seasonality, and Stationarity to make sure your preprocessing is solid.

Frequently Asked Interview Questions

Q: What is the fundamental difference between recursive and direct multi-step forecasting?

Recursive forecasting trains a single one-step-ahead model and iterates it $H$ times, feeding each prediction back as input. Direct forecasting trains $H$ independent models, each specialized for a specific horizon step. The key tradeoff is error accumulation (recursive) versus lack of inter-step dependency modeling (direct). Recursive is cheaper to train but degrades at longer horizons; direct is more expensive but each step's error is independent.

Q: Your recursive forecast degrades to a flat line after 10 steps. What's going wrong and how do you fix it?

The model is reverting to the unconditional mean because it can't capture long-range seasonal patterns within its lag window. Three fixes: (1) increase the window size to cover at least one full seasonal cycle, (2) add explicit calendar features (day-of-week, month, holiday flags) so the model gets seasonal information without needing a huge window, or (3) switch to a direct strategy that doesn't iterate. Often the fastest fix is adding calendar features, which takes five minutes and dramatically reduces mean-reversion.

Q: When would you choose MIMO over the direct strategy?

MIMO is better when consecutive forecast steps are strongly correlated and you need coherent, smooth forecast curves (think weekly demand patterns where Wednesday depends on Tuesday). MIMO shares parameters across all output steps, so it learns these cross-step relationships. Direct models treat each step independently, which can produce jagged, implausible trajectories. MIMO also has lower operational overhead (one model vs. $H$ models). Choose direct when steps are relatively independent, or when interpretability per step matters.

Q: How should you set up cross-validation for multi-step forecasting?

Use TimeSeriesSplit with a gap parameter equal to your forecast horizon $H$ . Without the gap, your validation fold overlaps with the forecast window, creating temporal leakage that inflates validation scores. For example, with $H = 7$, set gap=7 so there are at least 7 days between the last training observation and the first validation target.

Q: A production system needs 24-hour-ahead electricity forecasts at 15-minute resolution. Which strategy would you recommend?

That's $H = 96$ steps. Recursive is likely to accumulate significant errors over 96 iterations. Direct would require 96 models, feasible but operationally heavy. I'd start with MIMO using a sequence-to-sequence neural network (LSTM or Transformer), which naturally outputs the full 96-step vector in one pass and can learn the intra-day demand shape. Include exogenous features like weather forecasts, time-of-day embeddings, and holiday indicators. If a neural network is overkill for the data volume, DirRec with gradient-boosted trees is a strong alternative for the first 24-48 steps.

Q: What's the DirRec strategy and when does it beat both recursive and direct?

DirRec trains $H$ separate models like direct, but each model also receives the predictions from all earlier models in the chain as extra input features. This gives it the dependency-awareness of recursive forecasting without full reliance on predicted values. It beats pure direct when consecutive steps are strongly correlated (the extra signal helps) and beats pure recursive when the horizon is long enough for error accumulation to matter. It's most effective for medium horizons (5-30 steps) where dependencies are strong.

Q: You have a multi-step forecast for inventory planning. How do you evaluate whether the forecast is good enough for the business?

Don't just compute aggregate RMSE across all $H$ steps. Break it down by horizon step. A forecast that's great at step 1 but terrible at step 7 looks "average" in aggregate metrics but is useless for week-ahead inventory planning. Compute per-step RMSE and MAPE, set business-relevant thresholds per step (step 1 might tolerate 2% MAPE, step 7 might accept 5%), and track whether each step meets its target. Also compute prediction intervals, since point forecasts alone aren't enough for safety stock calculations.

Hands-On Practice

Multi-step time series forecasting is a critical skill for real-world applications where planning horizons extend beyond a single day. We'll move beyond simple next-day predictions and implement the two dominant strategies for predicting sequences: the Recursive Strategy (iterative) and the Direct Strategy (independent models). Using a realistic retail sales dataset, you will build forecasting engines that can predict sales 14 days into the future, learning to balance the trade-offs between error accumulation and model complexity.

Dataset: Retail Sales (Time Series) 3 years of daily retail sales data with clear trend, weekly/yearly seasonality, and related features. Includes sales, visitors, marketing spend, and temperature. Perfect for ARIMA, Exponential Smoothing, and Time Series Forecasting.

Implemented both Recursive and Direct forecasting strategies. You likely observed that the Recursive strategy follows the trend but may drift over time as errors compound, while the Direct strategy often captures specific future points better but requires maintaining multiple models. Experiment by changing the HORIZON variable to 30 days to see how drastically the recursive error accumulation degrades performance compared to the direct method.

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Free Career Roadmaps16 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, regional market notes, and the exact learning order that builds job evidence.

Global AI acceleration

Regression Trees and Random Forests transform predictive modeling by replacing rigid linear equations with flexible, recursive binary splitting. A Regression Tree predicts continuous values by partitioning datasets into homogeneous subsets based on minimizing Mean Squared Error or Variance at each node. While a single decision tree offers interpretability through its piecewise constant step functions, the model often suffers from high variance and overfitting. The Random Forest algorithm overcomes these limitations by aggregating hundreds of uncorrelated trees into an ensemble, leveraging the power of bagging (bootstrap aggregating) to stabilize predictions and reduce error. Readers learn to implement these non-parametric models in Python, utilizing scikit-learn to visualize decision boundaries and interpret feature importance. Mastering the transition from single greedy splitting strategies to robust ensemble techniques enables data scientists to model complex, non-linear relationships without extensive feature engineering.

InteractiveAudio

Oct 18, 2025