Chapter 1 - Houston H. Stokes Page

Page 413.) The Babylonian doctrine of ?the motherhood of God,? source of all
abominations, is what is subtly brought in here. The true derivation of Shaddai is
...... also supported a horde of idolatrous priests of her own?Babylonians all. See
Baal in ?The Two Babylons,? Hislop, and ?Great Prophecies,? Pember, pp. 138-
139.

Part of the document

Revised Chapter 2 in Specifying and Diagnostically Testing Econometric
Models (Edition 3)
© by Houston H. Stokes revised 16 December 2012. All rights reserved.
Preliminary Draft Chapter 2 Regression Analysis With Appropriate Specification Tests 2
2.0 Introduction 2
2.1 Standard Regression Model With a Constant 2
Table 2.1 Model Specification Tests 9
2.2 Other model specification tests 9
2.3 Generalized Least Squares 18
2.4 Weighted Regression Model 19
2.5 Autoregressive Disturbances Model 26
2.6 Estimation Control Options 41
2.7 Overview of Options Involving Computed Residuals 42
2.8 Residual Summary Statistics 43
2.9 BLUS Specification Test Options 46
2.10 BLUS In-Core and Out-of-Core Options 51
2.11 The Residual Analysis (RA) Option 54
2.12 BAYES Option 56
2.13 Constrained Least Squares and Mixed Estimation 57
2.14 Errors in Variables 64
2.15 Best Regression and Stepwise Estimation 68
2.16 Bagging 74
2.17 Examples 91
Table 2.2 BLUS Specification Tests on Data From Theil (1971) 94
Table 2.3 Real Money Balances in the Production Function 101
Fig. 2.1 C-D Production function Y, E(Y) and residuals. 120
Fig. 2.2 Two- and three-dimensional plots of data. 121
Table 2.4 Simulated Test Data for BLUS Analysis 124
Table 2.5 Generating BLUS Test Data 129
Table 2.6 B34S Matrix Program to perform BLUS Residual Analysis 130
Table 2.7 Dynamic Model Generation and Tests 140
Table 2.8 Code to perform a Granger Causality test 144
Table 2.9 Code to Illustrate a Partially Linear Regression Model
147
Table 2.10 Implementation of an Influence Diagnostic Test for the
Production Function Data 150
Table 2.11 Davidson-MacKinnon Leverage Analysis 155
Table 2.12 Heteroskedasticity Tests 158
Table 2.13 LTS Analysis of the Sinai-Stokes (1972) Production Function
Data 166
Table 2.14 Recursive Trimmed Squares 167
Table 2.15 Least Trimmed Squares code 170
Table 2.16 Program LTS_REC for Recursive Least Trimmed Squares 172
Table 2.17 Subroutine LTS_REC2 for custom calls. 174
Table 2.18 OLS-NW test-GLS-MARSPLINE-NW test 175
2.18 Conclusion 181 Regression Analysis With Appropriate Specification Tests 2.0 Introduction The options discussed in this chapter concern testing for appropriate
functional specification, ordinary least squares (OLS) models, generalized
least squares (GLS) models, minimum absolute deviation (L1) models, MINIMAX
models, and weighted least squares (WLS) models. The appropriate syntax
for running these options is contained in the regression, reg and robust
commands of Stokes (2000b). It is assumed that the user has entered his
data in B34S and is ready to estimate his/her model.[1] After a detailed
discussion of the statistics available, a number of examples are shown.[2] 2.1 Standard Regression Model With a Constant
Assume a set of T observations on (K) variables denoted
[pic] (2.1-1)
It is supposed that these variables are related by a set of stochastic
equations that can be written in matrix notation as: [pic] (2.1-2) where Y is the T x 1 vector of observations on the dependent variable; X
is the T x K-1 matrix of
independent variables, corresponding to the regression coefficients; i is
a T x 1 vector of 1's,
corresponding to the constant, [pic], [pic] is the K x 1 vector of fixed,
but unknown, regression
coefficients; [pic]is the standard deviation of the disturbance term e;
[pic] is a Tx1 vector of
independent, identically distributed random variables with mean ([pic])
= 0 and variance ([pic]) = 1.
[pic] is the population residual where e is a T x 1 vector of population
residuals. The least-squares estimate of [pic] may be computed in either of two
equivalent ways, the difference being in the handling of the constant term.
First, we might actually consider the matrix
(X, i) and compute the regression coefficients [pic] (2.1-3) where (') denotes the operation of matrix transposition. This approach
treats the constant like all other coefficients. Second, we could use the
deviation of the data from their means, get the coefficients [pic] and
then use the equation relating the means to compute [pic]. To do this, let [pic] and [pic] (2.1-4) so that [pic] is the K-1 x 1 vector of means of the independent variables
[pic] and [pic] the mean of the dependent variable, then [pic] (2.1-5) and [pic] (2.1-6) This approach takes advantage of the special structure of the last
regressor, i, and requires the inversion of a (K-1 ) x (K-1) instead of a
K x K matrix. Each approach has its advantages, but rather than working
both out completely (Johnston 1963, 134-135), we will use here only the
first one. For convenience we will now redefine X to include i, so we can rewrite
the basic model as
[pic] (2.1-7) where E(e) = 0, E(ee') = [pic] and the least squares estimator of [pic] as [pic] (2.1-8)
Given [pic], we can compute the estimates of the sample disturbances, or
the residuals [pic]. (2.1-9) and from them an unbiased estimate of [pic], the residual variance [pic]. (2.1-10) The square root of [pic] is the standard error of estimate Sy.x or the Root
Mean square error.
If X is independent of e (Goldberger 1964, 164, 167, 267-268), the
covariance matrix of regression coefficients is estimated by [pic] (2.1-11) The square roots of the diagonal elements of the covariance matrix are the
standard errors of the coefficients [pic]. The t-ratios of the
coefficients, ti are defined as
[pic]. (2.1-12) The partial correlation between y and [pic], given all the other variables,
is given the same algebraic sign as [pic] and is defined [pic] (2.1-13) where we assume there are K right-hand-side variables. The equivalence of
this formula to the standard definition of partial correlation is given in
Gustafson (1961, appendix 1, 365). The elasticities estimated at the mean
are [pic]. (2.1-14) The squared coefficient of multiple correlation, R2, [pic] (2.1-15)
where [pic] is the variance of the dependent variable. In a model with only
one right hand side variable [pic] and a constant, [pic] is the squared
correlation coeffient between y and x. Theil (1971, 179) recommends the
adjusted multiple correlation coefficient [pic], (2.1-16) which is preferred to equation (2.1-15) since it is adjusted for the number
of independent variables in the regression. (2.1-16) can be written as
[pic] . The F-statistic to test the hypothesis that [pic] is [pic] (2.1-17) and has (K-1, T-K) degrees of freedom. Additional diagnostic statistics provided include -2 * ln( maximum of the
likelihood function), which is defined as [pic] (2.1-18) Akaike's Information Criterion (AIC) (1973), defined as [pic] (2.1-19) and the Schwartz Information Criterion (SIC) (1978), which is defined as [pic]. (2.1-20) Note that these "classic" formula include a degree of freedom correction
for the estimation of [pic].The SIC is often called the Bayesian
Information Criterion. Using these formulas, the smaller the AIC and SIC,
the better the model since this means that not withstanding the added
penalty for degrees of freedom, u'u declined sufficiently. It is to be
noted that the AIC and SIC increase (2.1-18) with a penalty as more
variables are estimated. Many software systems such as modler, however,
use different definations and report the the log likelihood function (LLF) [pic] (2.1-21) and the log of the AIC and SIC defined as [pic] (2.1-22) and [pic] (2.1-23) Other software systems use variants of these formulas. For example RATS
uses [pic] and [pic] while SAS uses [pic] and [pic] where [pic]. In view
of these differences in approach, it is suggested that all reports listing
a SIC or AIC be careful to indicate the formula used.
When estimating a finite distributed lag model[pic] with lag q, Greene
(2000, 717) suggests a slight change in the AIC formula to [pic] and a
Schwartz criterion defined as [pic], although there are other possible
variants to be considered. Note that for the above example [pic]. It can be shown that the OLS estimators [pic] are in fact maximum
likelihood estimators. Let f(yi) be the probability density of the left-
hand variable. The maximum likelihood method of estimation attempts to
select values for the estimated coefficients and [pic] such that the
likelihood L [pic] (2.1-24) is maximized. Kmenta (1971, 213) cites the change of variable theorem, "If
a random variable X has probability density f(x), and if a variable Z is a
function of X such that there is a one-to-one correspondence between X and
Z, then the probability density of Z is [pic]." Using this theorem and the
model specified in equation (2