Your forecasting model should include features which capture all the important qualitative properties of the data: patterns of variation in level and trend, effects of inflation and seasonality, etc. Moreover, the assumptions which underlie your chosen model should agree with your intuition about how the series is likely to behave in the future. When fitting a forecasting model, you have some of the following choices:
Deflation?
Log transformation?
Seasonal adjustment?
Independent variables?
Smoothing, averaging, or random walk?
ARIMA?
Winters seasonal smoothing?
These options are briefly described below, and are discussed in more depth in other notes in the course outline. See the accompanying Forecasting Flow Chart for a pictorial view of the model-specification process, and refer back to the Statgraphics Model Specification panel to see how the model features are selected in the software.
Deflation? If the series shows inflationary growth, then deflation will help to account for the growth pattern and reduce heteroscedasticity in the residuals. You can either (i) deflate the past data and reinflate the long-term forecasts at a constant assumed rate, or (ii) deflate the past data by a price index such as the CPI, and then "manually" reinflate the long-term forecasts using a forecast of the price index. Option (i) is the easiest approach in Statgraphics: just enter an assumed inflation rate on the Model Specification panel, and all the details are handled automatically. If you choose this option, it is usually best to set the inflation rate equal to your best estimate of the current rate, particularly if you are going to forecast more than one period ahead. If instead you choose option (ii), you must first save the deflated forecasts and confidence limits to your data spreadsheet using the "Save results" button on the Analysis Window Toolbar, then generate and save a forecast for the price index, and finally multiply the appropriate columns together. (To do the multiplication in Statgraphics, highlight an usused column on the spreadsheet, select Edit/Generate_Data from the menu, and enter the product of the desired two variables--e.g., SALESFCST*CPIFCST--in the "Expression" field.) (Return to top of page.)
Logarithm transformation? If the series shows compound growth and/or a multiplicative seasonal pattern, a logarithm transformation may be helpful in addition to or lieu of deflation. Logging the data will not flatten an inflationary growth pattern, but it will straighten it out it so that it can be fitted by a linear model (e.g., a random walk or ARIMA model with constant growth, or a linear exponential smoothing model). Also, logging will convert multiplicative seasonal patterns to additive patterns, so that if you perform seasonal adjustment after logging, you should use the additive type. Logging deals with inflation in an implicit manner; if you want inflation to be modeled explicitly--i.e., if you want the inflation rate to be a visible parameter of the model or if you want to view plots of deflated data--then you should deflate rather than log. (Return to top of page.)
Seasonal adjustment? If the series has a strong seasonal pattern which is believed to be constant from year to year, seasonal adjustment may be an appropriate way to estimate and extrapolate the pattern. The advantage of seasonal adjustment is that it models the seasonal pattern explicitly, giving you the option of studying the seasonal indices and the seasonally adjusted data. The disadvantage is that it requires the estimation of a large number of additional parameters (particularly for monthly data), and it provides no theoretical rationale for the calculation of "correct" confidence intervals. Out-of-sample validation is especially important to reduce the risk of over-fitting the past data through seasonal adjustment. If the data is strongly seasonal but you do not choose seasonal adjustment, the alternatives are to either (i) use a seasonal ARIMA model, which implicitly forecasts the seasonal pattern using seasonal lags and differences, or (ii) use the Winters seasonal exponential smoothing model, which estimates time-varying seasonal indices. (Return to top of page.)
"Independent" variables? If there are other time series which you believe to have explanatory power with respect to your series of interest (e.g., leading economic indicators or policy variables such as price, advertising, promotions, etc.) you may wish to consider regression as your model type. Whether or not you choose regression, you still need to consider the possibilies mentioned above for transforming your variables (deflation, log, seasonal adjustment--and perhaps also differencing) so as to exploit the time dimension and/or linearize the relationships. Even if you do not choose regression at this point, you may wish to consider adding regressors later to a time-series model (e.g., an ARIMA model) if the residuals turn out to have signficant cross-correlations with other variables. (Return to top of page.)
Smoothing, averaging, or random walk? If you have chosen to seasonally adjust the data--or if the data are not seasonal to begin with--then you may wish to use a simple averaging or smoothing model to fit the nonseasonal pattern which remains in the data at this point. A simple moving average or simple exponential smoothing model merely computes a local average of data at the end of the series, on the assumption that this is the best estimate of the current mean value around which the data are fluctuating. (These models assume that the mean of the series is varying slowly and randomly without persistent trends.) Simple exponential smoothing is normally preferred to a simple moving average, because its exponentially weighted average does a more sensible job of discounting the older data, because its smoothing parameter (alpha) is continuous and can be readily optimized, and because it has an underlying theoretical basis for computing confidence intervals.
If smoothing or averaging does not seem to be helpful--i.e., if the best predictor of the next value of the time series is simply its previous value--then a random walk model is indicated. This is the case, for example, if the optimal number of terms in the simple moving average turns out to be 1, or if the optimal value of alpha in simple exponential smoothing turns out to be 0.9999.
Brown's linear exponential smoothing can be used to fit a series with slowly time-varying linear trends, but be cautious about extrapolating such trends very far into the future. (The rapidly-widening confidence intervals for this model testify to its uncertainty about the distant future.) Holt's linear smoothing also estimates time-varying trends, but uses separate parameters alpha and beta for smoothing the level and trend, respectively. Thus, it allows you to assume different rates of change for the level and trend, which may provide a better fit to the data in some cases. Brown's quadratic exponential smoothing model attempts to estimate time-varying quadratic trends, and should virtually never be used. (This would correspond to an ARIMA model with three orders of nonseasonal differencing.)
Linear, quadratic, or exponential trend line models are other options for extrapolating a deseasonalized series, but they rarely outperform random walk, smoothing, or ARIMA models on business data. (Return to top of page.)
ARIMA? If you do not choose seasonal adjustment (or if the data are non-seasonal), you may wish to use the ARIMA framework for specifying the remaining features of your model. ARIMA models are a very general class of models that includes random walk, random trend, and exponential smoothing models as special cases. The conventional wisdom is that a series is a good candidate for an ARIMA model if (i) it can be stationarized by a combination of differencing and other mathematical transformations such as logging, and (ii) you have a substantial amount of data to work with: at least 4 full seasons. (If the series cannot be adequately stationarized by differencing--e.g., if it is very irregular or seems to be qualitatively changing its behavior over time--or if you have fewer than 4 seasons of data, then you might be better off with a model that uses seasonal adjustment and some kind of simple averaging or smoothing.)
The first step in fitting an ARIMA model is to determine the appropriate order of differencing needed to stationarize the series and remove the gross features of seasonality. This is equivalent to determining which "naive" random-walk or random-trend model gives the best fit--i..e., which combination of differencing yields the lowest RMSE and best residual diagnostics. Do not attempt to use more than 2 total orders of differencing (non-seasonal and seasonal combined), and do not use more than 1 seasonal difference.
The second step is to determine whether to include a constant term in the model: usually you do include a constant term if the total order of differencing is 1 or less, otherwise you don't. In a model with one order of differencing, the constant term represents the average trend in the forecasts. In a model with two orders of differencing, the trend in the forecasts is determined by the local trend observed at the end of the time series, and the constant term represents the trend-in-the-trend, i.e., the curvature of the long-term forecasts. Normally it is dangerous to extrapolate trends-in-trends, so you suppress the contant term in this case.
The third step is to adjust the AR, MA, SAR, and SMA parameters to eliminate any autocorrelation that remains in the residuals of the naive model (i.e., any correlation that remains after mere differencing). These parameters determine the number of lags of the differenced series and/or lags of the forecast errors that are included in the forecasting equation. If there is no significant autocorrelation in the residuals at this point, then STOP, you're done: the best model is a naive model! If there is significant autocorrelation at lags 1 or 2, you should try setting MA=1 if one of the following applies: (i) there is a non-seasonal difference in the model, (ii) the lag 1 autocorrelation is negative, and/or (iii) the residual autocorrelation plot is cleaner-looking (fewer, more isolated spikes) than the residual partial autocorrelation plot. Otherwise, if there is no non-seasonal difference in the model and/or the lag 1 autocorrelation is positive and/or the residual partial autocorrelation plot looks cleaner, then try AR=1. (Sometimes these rules conflict with each other, in which case it probably doesn't make much difference which parameter you use. Try them both and compare.) If there is autocorrelation at lag 2 that is not removed by setting MA=1 or AR=1, you can then try MA=2 or AR=2, or occasionally AR=1 and MA=1. (Note: an ARIMA(0,1,1) model without constant is identical to a simple exponential smoothing model. An ARIMA(0,1,1) model with constant is a simple exponential smoothing model with a constant linear trend term included. An ARIMA(0,2,1) or (0,2,2) model without constant is a linear exponential smoothing model.)
The same kind of rules apply to the SAR and SMA parameters with respect to autocorrelation at the seasonal period (e.g., lag 12 for monthly data). Try SMA=1 if there is already a seasonal difference in the model and/or the seasonal autocorrelation is negative and/or the residual autocorrelation plot looks cleaner in the vicinity of the seasonal lag; otherwise try SAR=1. However, the sum of the SMA and SAR parameters should never be greater than 1: do not ever try SMA=2, SAR=2, or both SMA=1 and SAR=1. (Return to top of page.)
Winters Seasonal Exponential Smoothing? Winters Seasonal Smoothing is an extension of exponential smoothing that simultaneously estimates time-varying level, trend, and seasonal factors using recursive equations. (Thus, if you use this model, you would not first seasonally adjust the data.) The Winters seasonal factors can be either multiplicative or additive: normally you should choose the multiplicative option unless you have logged the data. Although the Winters model is clever and reasonably intuitive, it can be tricky to apply in practice: it has three smoothing parameters--alpha, beta, and gamma--for separately smoothing the level, trend, and seasonal factors, which must be estimated simultaneously. Determination of starting values for the seasonal indices can be done by applying the ratio-to-moving average method of seasonal adjustment to part or all of the series and/or by backforecasting. The estimation algorithm that Statgraphics uses for these parameters sometimes fails to converge and/or yields values which give bizarre-looking forecasts and confidence intervals, so I would recommend caution when using this model. (Return to top of page.)