FORECAST PRO Example #1: Expert-aided analysis of data from
assignment #3
Variables entered in the initial tableau were S, ADV[-1], CADV[-1],
CC[-1], Y[-1], and P[-1] and _CONST
(Note: ADV[-1] is ADV lagged by one period, etc.) 8 values were
held out for validation.
Forecast Pro for Windows Standard Edition Version 2.00
Sun Oct 06 10:54:27 1996
Expert data exploration of dependent variable S
---------------------------------------------------------------------
Length 29 Minimum 4.396 Maximum 6.676
Mean 5.481 Standard deviation 0.580
Classical decomposition (multiplicative)
Trend-cycle: 24.00% Seasonal: 5.14% Irregular: 70.86%
There are 5 strongly significant regressors.
_CONST
ADV[-1]
CADV[-1]
CC[-1]
P[-1]
Series is trended and seasonal.
Seasonal? (I don't know where this comment came from--the estimated
seasonal variance component is only 5%!)
Recommended model: Dynamic Regression
"Dynamic regression" simply means a time series regression model--i.e., a model that may end up including lagged variables and/or lagged error terms. Let's start by running an ordinary multiple regression with all the variables, obtaining the following standard output:
Forecast Model for S Regression(6 regressors, 0 lagged errors) Term Coefficient Std. Error t-Statistic Significance --------------------------------------------------------------------- _CONST 3.450625 0.990532 3.483609 0.997993 ADV[-1] 0.009580 0.003431 2.791892 0.989639 CADV[-1] -0.000602 0.000755 -0.797894 0.566907 +++ CC[-1] 0.003256 0.001341 2.427545 0.976562 P[-1] 0.033864 0.015039 2.251762 0.965812 Y[-1] -0.002405 0.001313 -1.831939 0.920057 +++ Marked regressors are insignificant.
Note: the "significance" values reported here are one minus the usual values--here a value less than 0.95 means NOT significant. (Confusing!)
Standard Diagnostics ------------------------------------------------------------- Sample size 29 Number of parameters 6 Mean 5.481 Standard deviation 0.5907 R-square 0.5208 Adjusted R-square 0.4166 Durbin-Watson 1.901 Ljung-Box(13)=16.37 P=0.7705 Forecast error 0.4512 BIC 0.5692 (Best so far) MAPE 0.05933 RMSE 0.4018 MAD 0.3208
The sample "mean" and "standard deviation" are statistics of the dependent variable (here, S). The "forecast error" is the standard error of the estimate (RMSE in units fitted). The Bayesian Information Criterion (BIC) is the forecast error magnified by a penalty factor for the number of parameters estimated--theoretically it is the best bottom-line figure for comparisons between models of the same general type with different numbers of parameters. MAPE, MAD (=MAE), and RMSE are calculated in original units, if different from the units fitted. (Here they are in the same units because no nonlinear transformation was used.) RMSE apparently does not include an adjustment for the number of parameters fitted, which is why it is different from "forecast error." (?)
Rolling simulation results
Cumulative Cumulative
H N MAD Average MAPE Average
---------------------------------------------------------------------
1 8 0.446 0.446 0.080 0.080
2 7 0.364 0.408 0.064 0.073
3 6 0.301 0.377 0.052 0.067
4 5 0.334 0.369 0.058 0.065
5 4 0.322 0.363 0.053 0.063
6 3 0.411 0.367 0.068 0.064
7 2 0.398 0.369 0.069 0.064
8 1 0.536 0.374 0.089 0.065
Wait a minute: this regression model cannot really forecast more than one period ahead! "Rolling simulation is not meaningful for regression models" according to the software developer. I believe that in this case the first row of statistics represents the validation period for the model, and the other rows are just double-counting some of the same errors .
The following diagnostic test for lagged variables not currently in the model is really useful, however:
Variable specification test battery ------------------------------------------------------------------------ _CONST[-1] ChiSq( 1)=0.53 Percentile=0.5334 ADV[-2] 1.69 0.8062 CADV[-2] 0.53 0.5342 CC[-2] 0.53 0.5334 P[-2] 0.56 0.5439 Y[-2] 12.65 0.9996 ** _TREND 12.15 0.9995 ** Try adding Y[-2] to model.
Good suggestion--will do! Now here are some additional useful
tests to see if ARIMA corrections would be helpful in the model--i.e.,
lags of the dependent variable and/or lags of the errors. Here,
S[-1] is the dependent variable lagged by one period, and _AUTO[-1]
refers to the errors lagged by one period, etc.
Dynamics test battery ------------------------------------------------------------------------ S[- 1] ChiSq( 1)=0.83 Percentile=0.6373 S[- 2] 0.98 0.6777 S[- 3] 0.83 0.6382 S[- 4] 2.01 0.8433 S[- 8] 2.85 0.9084 _AUTO[- 1] ChiSq( 1)=0.59 Percentile=0.5573 _AUTO[- 2] 1.03 0.6904 _AUTO[- 3] 2.84 0.9082 _AUTO[- 4] 1.96 0.8382 _AUTO[- 8] 4.21 0.9599 * Dynamics tests successful.
OK, nothing else surprising turned up. Note, however, that no tests were performed to determine the relative stationarity of the variables--for example, the fact that P ought to be differenced to be comparable to the other variables does not register.
Let's now try adding Y[-2], as recommended above, before removing any insignificant variables:
Forecast Model for S
Regression(7 regressors, 0 lagged errors)
Term Coefficient Std. Error t-Statistic Significance
---------------------------------------------------------------------
_CONST 3.763999 0.662905 5.678036 0.999990
ADV[-1] 0.008084 0.002304 3.508493 0.998017
CADV[-1] -0.000561 0.000503 -1.114696 0.722990 +++
CC[-1] 0.001823 0.000932 1.956479 0.936779 +++
P[-1] -0.005059 0.012307 -0.411072 0.315003 +++
Y[-1] -0.001514 0.000891 -1.700453 0.896859 +++
Y[-2] 0.005299 0.000972 5.453561 0.999982
Marked regressors are insignificant.
Standard Diagnostics
----------------------------------------------------------------
Sample size 29 Number of parameters 7
Mean 5.481 Standard deviation 0.5907
R-square 0.7962 Adjusted R-square 0.7407
Durbin-Watson 2.306 ** Ljung-Box(12)=32.29 P=0.9988
Forecast error 0.3008 BIC 0.3934 (Best so far)
MAPE 0.03753 RMSE 0.262
MAD 0.198
Rolling simulation results
Cumulative Cumulative
H N MAD Average MAPE Average
---------------------------------------------------------------------
1 8 0.124 0.124 0.022 0.022
2 7 0.122 0.123 0.022 0.022
3 6 0.094 0.115 0.016 0.020
4 5 0.100 0.112 0.017 0.020
5 4 0.096 0.110 0.016 0.019
6 3 0.111 0.110 0.019 0.019
7 2 0.095 0.109 0.017 0.019
8 1 0.081 0.108 0.014 0.019
Now we'll try removing the insignificant variables one at a time--i.e., manually perform backward stepwise regression from this point:
Forecast Model for S
Regression(6 regressors, 0 lagged errors)
Term Coefficient Std. Error t-Statistic Significance
---------------------------------------------------------------------
_CONST 3.560179 0.431978 8.241575 1.000000
ADV[-1] 0.008105 0.002262 3.583611 0.998428
CADV[-1] -0.000550 0.000493 -1.113756 0.723112 +++
CC[-1] 0.001932 0.000878 2.200628 0.961929
Y[-1] -0.001683 0.000776 -2.169706 0.959392
Y[-2] 0.005068 0.000777 6.520354 0.999999
Marked regressors are insignificant.
Standard Diagnostics
----------------------------------------------------------------
Sample size 29 Number of parameters 6
Mean 5.481 Standard deviation 0.5907
R-square 0.7947 Adjusted R-square 0.75
Durbin-Watson 2.281 ** Ljung-Box(13)=29.76 P=0.9949
Forecast error 0.2953 BIC 0.3726 (Best so far)
MAPE 0.03892 RMSE 0.263
MAD 0.2052
Rolling simulation results
Cumulative Cumulative
H N MAD Average MAPE Average
---------------------------------------------------------------------
1 8 0.145 0.145 0.026 0.026
2 7 0.136 0.141 0.025 0.026
3 6 0.101 0.129 0.018 0.023
4 5 0.102 0.124 0.018 0.022
5 4 0.087 0.119 0.015 0.021
6 3 0.094 0.117 0.016 0.021
7 2 0.085 0.115 0.016 0.021
8 1 0.010 0.112 0.002 0.020
Forecast Model for S
Regression(5 regressors, 0 lagged errors)
Term Coefficient Std. Error t-Statistic Significance
---------------------------------------------------------------------
_CONST 3.449871 0.422572 8.163984 1.000000
ADV[-1] 0.008161 0.002272 3.591697 0.998532
CC[-1] 0.001916 0.000882 2.171779 0.960020
Y[-1] -0.001833 0.000768 -2.387589 0.974816
Y[-2] 0.005118 0.000780 6.563664 0.999999
Standard Diagnostics
---------------------------------------------------------------
Sample size 29 Number of parameters 5
Mean 5.481 Standard deviation 0.5907
R-square 0.7836 Adjusted R-square 0.7475
Durbin-Watson 2.196 * Ljung-Box(14)=28.72 P=0.9886
Forecast error 0.2968 BIC 0.3609 (Best so far)
MAPE 0.04002 RMSE 0.27
MAD 0.2117
Rolling simulation results
Cumulative Cumulative
H N MAD Average MAPE Average
---------------------------------------------------------------------
1 8 0.146 0.146 0.026 0.026
2 7 0.148 0.147 0.026 0.026
3 6 0.135 0.143 0.024 0.025
4 5 0.125 0.140 0.022 0.025
5 4 0.138 0.140 0.024 0.025
6 3 0.173 0.143 0.030 0.025
7 2 0.177 0.145 0.033 0.025
8 1 0.104 0.143 0.017 0.025
Variable specification test battery
------------------------------------------------------------------------
CADV[-1] ChiSq( 1)=1.18 Percentile=0.7220
P[-1] 0.12 0.2711
_CONST[-1] 0.05 0.1787
ADV[-2] 0.58 0.5527
CC[-2] 0.06 0.1856
Y[-3] 0.38 0.4650
_TREND 1.62 0.7968
Variable specification tests successful.
Dynamics test battery
------------------------------------------------------------------------
S[- 1] ChiSq( 1)=0.24 Percentile=0.3754
S[- 2] 0.34 0.4379
S[- 3] 0.67 0.5855
S[- 4] 0.20 0.3490
S[- 8] 1.09 0.7036
_AUTO[- 1] ChiSq( 1)=0.52 Percentile=0.5293
_AUTO[- 2] 5.45 0.9805 *
_AUTO[- 3] 0.61 0.5645
_AUTO[- 4] 1.87 0.8283
_AUTO[- 8] 1.14 0.7146
Dynamics tests successful.
Done! Notice that this is the model that we obtained by automatic backward stepwise regression in Statgraphics, starting with 2 lags of all variables. The nice thing about Forecast Pro's analysis is that it tested for lags of all variables and lags of the errors which we didn't use in the original model (although it didn't test for some other things, like the usefulness of differencing any of the variables).
For comparison, here are the results of fitting and validating the same model in the GLM procedure in Statgraphics. The estimated coefficients are of course the same. Also, the validation period statistics seem to agree with those in the first row of the "rolling simulation results" table, as we had guessed earlier. (MAE=MAD=0.146, MAPE=2.6%)
--------------------------------------------------------------------------------------------
Standard
Parameter Estimate Error Lower Limit Upper Limit V.I.F.
--------------------------------------------------------------------------------------------
CONSTANT 3.44987 0.422572 2.57772 4.32202
LAG(Y,1) -0.0018332 0.000767803 -0.00341787 -0.000248527 1.08898
LAG(Y,2) 0.00511806 0.000779756 0.00350872 0.0067274 1.08911
LAG(CC,1) 0.00191563 0.000882057 0.0000951528 0.00373611 1.15488
LAG(ADV,1) 0.00816141 0.0022723 0.0034716 0.0128512 1.21309
--------------------------------------------------------------------------------------------
R-Squared = 78.3605 percent
R-Squared (adjusted for d.f.) = 74.7539 percent
Standard Error of Est. = 0.29679
Mean absolute error = 0.211697
Durbin-Watson statistic = 2.19583
Residual Analysis
---------------------------------
Estimation Validation
n 29 8
MSE 0.088084 0.0261489
MAE 0.211697 0.146084
MAPE 4.00221 2.60524
ME 3.98149E-16 -0.0704765
MPE -0.261927 -1.41513