DS 4100 Data Collection, Integration, and Analysis

Today we’re talking more about predictive analytics

Prediction Confidence

Forecast range

A forecast should be given as a range. The range to provide is the 95% Confidence Interval, i.e., the range into which there is a 95% probability that the actual value will fall. Forecasting models must be continually evaluated to assure that they still provide accurate forecast estimates. The tracking signal (TS) is a measure of the quality of the forecasts:

Interpretation of the tracking signal:

* Positive: under prediction (Y > F)
* Negative: over prediction (Y < F) Tracking signals should not exceed ±4 MADs.

Cyclical Adjusted Models

Often there are seasonal (cyclical) variations in growth, demand, or costs. Averaging techniques fail to take fluctuations into account. Those can be accounted for with a seasonality adjustment to the forecast or a multiple regression model with season as a factor. A common model is to adjust the forecast by multiplying with a seasonality index. This is a multiplicative model. The seasonality index measures how much above or below each “season” is relative to an average season. A season is simply a cycle in the business, not an actual season like Winter.

1) For each time period, calculate the average demand per season.
2) For each time period, divide the actual seasonal demand by the average seasonal demand. This ratio is the seasonality index for that year.
3) Compute the average seasonality index for each season.
4) Calculate a forecast for the entire next time period and then divide that by the number of seasons to get an average.
5) Multiply the average by the seasonality index for that season.

Use multiple regression to account for cycles (seasons) as well as trend: * Turn the cycle (season) component into a dummy variable * Build the regression model * Evaluate the fit with Adjusted R2 and MAD This is an additive model rather than a multiplicative model.

Instead of a value, we can calculate a likely range for the forecast. The 95% CI is the range into which the actual forecast will fall with a 95% likelihood:

95%CI = F_t+1 ± 1.96 * SE

SE is the standard error: On the regression output, or Calculable from the MAD

Multiple Regression Models

Often a dependent variable that is to be predicted is based on more than one independent predictor variable. Multiple regression helps to capture multiple variables and often result in a more valuable forecast. Some examples:


For each pair of dependent and independent variables, ask:

Evaluate p-values of variables:

Steps for developing a regression model: