Prediction Confidence

Forecast range

A forecast should be given as a range. The range to provide is the 95% Confidence Interval, i.e., the range into which there is a 95% probability that the actual value will fall. Forecasting models must be continually evaluated to assure that they still provide accurate forecast estimates. The tracking signal (TS) is a measure of the quality of the forecasts:

Interpretation of the tracking signal:

Positive: under prediction (Y > F)
Negative: over prediction (Y < F)

Tracking signals should not exceed ±4 MADs.

Cyclical Adjusted Models

Often there are seasonal (cyclical) variations in growth, demand, or costs. Averaging techniques fail to take fluctuations into account. Those can be accounted for with a seasonality adjustment to the forecast or a multiple regression model with season as a factor. A common model is to adjust the forecast by multiplying with a seasonality index. This is a multiplicative model. The seasonality index measures how much above or below each "season" is relative to an average season. A season is simply a cycle in the business, not an actual season like Winter.

For each time period, calculate the average demand per season.
For each time period, divide the actual seasonal demand by the average seasonal demand. This ratio is the seasonality index for that year.
Compute the average seasonality index for each season.
Calculate a forecast for the entire next time period and then divide that by the number of seasons to get an average.
Multiply the average by the seasonality index for that season.

Use multiple regression to account for cycles (seasons) as well as trend: _ Turn the cycle (season) component into a dummy variable _ Build the regression model * Evaluate the fit with Adjusted R2 and MAD This is an additive model rather than a multiplicative model.

Instead of a value, we can calculate a likely range for the forecast. The 95% CI is the range into which the actual forecast will fall with a 95% likelihood:

95%CI = F_t+1 ± 1.96 * SE

SE is the standard error: On the regression output, or Calculable from the MAD

Multiple Regression Models

Often a dependent variable that is to be predicted is based on more than one independent predictor variable. Multiple regression helps to capture multiple variables and often result in a more valuable forecast. Some examples:

models that establish compensation
predicting pension income
explain variations in home sales prices
establish pay guidelines for new hires
forecast student performance and likelihood of success

Steps:

plot the variables
identify outliers
is relationship linear?
create regression model
evaluate p-values
calculate forecast
define confidence interval

For each pair of dependent and independent variables, ask:

Is there reasonably strong correlation?
Is the correlation positive or negative?
Are there outliers and should they be excluded?
Is the relationship linear, i.e., is it along a line?
Does the scatter plot reflect what you expect?
Is the data reasonably normally distributed?

Evaluate p-values of variables:

Each independent variable must be a factor in the model and have a statistically significant impact on the prediction.
If a variable’s p-value is less than 0.05 then it is not statistically significant and must be removed from the model to avoid overfitting.
The threshold of 0.05 is somewhat arbitrary but generally accepted in the analytics community.
However, the non significance of the variable “Years of Experience” could be due to the presence of outliers, so the model should be re-built without outliers.

Steps for developing a regression model:

specify the variable that is to be predicted
generate the independent variables that are thought to influence the dependent variable
collect data and inspect the data for outliers
visually inspect the relationships between the independent and dependent variables to ensure there’s correlation
create a multiple regression model from the coefficients
evaluate the model by interpreting its Adjusted R2, and the p values of the regression variables
remove any non-significant variables and create a new model
use the model to calculate a forecast with a confidence interval