The Generalized Regression Model And ... - NYU Stern

If the regressors are stochastic, then the unconditional variance is . ... [4] With well
-behaved regressors, will converge to a constant matrix , and 2 / n will ...... It can
also be shown?we leave it as an exercise?that if the fourth moment of every ...

Part of the document

9
The Generalized Regression Model
And Heteroscedasticity 9.1 INTRODUCTION In this and the next several chapters, we will extend the multiple
regression model to disturbances that violate Assumption A.4 of the
classical regression model. The generalized linear regression model is
[pic]
[pic] (9-1)
[pic]
where [pic] is a positive definite matrix. (The covariance matrix is
written in the form [pic] at several points so that we can obtain the
classical model, (2I [pic], as a convenient special case.)
The two leading cases we will consider in detail are heteroscedasticity
and autocorrelation. Disturbances are heteroscedastic when they have
different variances. Heteroscedasticity arises in numerous applications, in
both cross-section and time-series data. Heteroscedasticity arises in
vVolatile high-frequency time-series data, such as daily observations in
financial markets are heteroscedastic. andHeteroscedasticity appears in
cross-section data where the scale of the dependent variable and the
explanatory power of the model tend to vary across observations.
Microeconomic data such as expenditure surveys are typical. Even after
accounting for firm sizes, we expect to observe greater variation in the
profits of large firms than in those of small ones. The variance of profits
might also depend on product diversification, research and development
expenditure, and industry characteristics and therefore might also vary
across firms of similar sizes. When analyzing family spending patterns, we
find that there is greater variation in expenditure on certain commodity
groups among high-income families than low ones due to the greater
discretion allowed by higher incomes.
The disturbances are still assumed to be uncorrelated across
observations, so (2( [pic] would be
[pic]
(The first mentioned situation involving financial data is more complex
than this and is examined in detail in Chapter 20.)
Autocorrelation is usually found in time-series data. Economic time
series often display a "memory" in that variation around the regression
function is not independent from one period to the next. The seasonally
adjusted price and quantity series published by government agencies are
examples. Time-series data are usually homoscedastic, so [pic] (2( might be
[pic]
The values that appear off the diagonal depend on the model used for the
disturbance. In most cases, consistent with the notion of a fading memory,
the values decline as we move away from the diagonal.
A number of other cases considered later will fit in this framework.
Panel data sets, consisting of cross sections observed at several points in
time, may exhibit both characteristicsheteroscedasticity and
autocorrelation.. In the random effects model, yit = xit(( + ui + ?it, with
E[?it|xit] = E[ui|xit] = 0, the implication is that
[pic]
The specification exhibits autocorrelation. We shall consider them it in
Chapter 11. Models of spatial autocorrelation, examined in Chapter 11, and
multiple equation regression models considered in Chapter 10 are also forms
of the generalized regression model.
This chapter presents some general results for this extended model.
We will examine focus on the model of heteroscedasticity in this chapter
and in Chapter 14. A general model of autocorrelation appears in Chapter
20. Chapters 10 and 11 examine in detail other specific types of
generalized regression models.
Our earlier results for the classical model will have to be modified. We
will take the following approach on general results and in the specific
cases of heteroscedasticity and serial correlation:
1. We first consider the consequences for the least squares estimator of
the more general form of the regression model. This will include assessing
the effect of ignoring the complication of the generalized model and of
devising an appropriate estimation strategy, still based on least squares. 2. We will then examine alternative estimation approaches that can make
better use of the characteristics of the model. Minimal assumptions about
[pic] are made at this point.
3. We then narrow the assumptions and begin to look for methods of
detecting the failure of the classical model-that is, we formulate
procedures for testing the specification of the classical model against the
generalized regression.
4. The final step in the analysis is to formulate parametric models that
make specific assumptions about [pic]. Estimators in this setting are some
form of generalized least squares or maximum likelihood which is developed
in Chapter 14.
The model is examined in general terms in this chapter. Major applications
to panel data and multiple equation systems are considered in Chapters 11
and 10, respectively.
9.2 ROBUST Least squares ESTIMATION
AND INFERENCE
The generalized regression model in (9-1) drops assumption A.4. If ( ?
I, then the disturbances may be heteroscedastic or autocorrelated or both.
The least squares estimator is
[pic] (9-2)
The covariance matrix of the estimator based on (9-1)-(9-2) would be
[pic] (9-3)
Based on (9-3), we see that s2(X(X)-1 would not be the appropriate
estimator for the asymptotic covariance matrix for the least squares
estimator, b. In Section 4.5, we considered a strategy for estimation of
the appropriate covariance matrix, without making explicit assumptions
about the form of (, for two cases, heteroscedasticity, and "clustering"
(which resembles the random effects model suggested in the Introduction).
We will add some detail to that discussion for the heteroscedasticity case.
Clustering is revisited in Chapter 11.
The matrix (X(X/n) is readily computable using the sample data. The
complication is the center matrix that involves the unknown [pic]. For
estimation purposes, (2 is not a separate unknown parameter. We can
arbitrarily scale the unknown [pic], say, by (, and (2 by 1/( and obtain
the same product. We will remove the indeterminacy by assuming that
trace[pic], as it is when ( = I. Let ( = (2 (. It might seem that to
estimate (1/n)X((X, an estimator of (, which contains n(n+1)/2 unknown
parameters, is required. But fortunately (because with only n observations,
this would be hopeless), this observation is not quite right. What is
required is an estimator of the K(K+1)/2 unknown elements in the center
matrix [pic] The point is that [pic] is a matrix of sums of squares and
cross products that involves [pic] and the rows of X. The least squares
estimator b is a consistent estimator of [pic], which implies that the
least squares residuals ei are "pointwise" consistent estimators of their
population counterparts ?i. The general approach, then, will be to use X
and e to devise an estimator of [pic] for the heteroscedasticity case, (ij
= 0 when i ? j.
We seek an estimator of [pic]White (1980, 2001) shows that under very
general conditions, the estimator
[pic] (9-4)
has [pic][1] The end result is that the White heteroscedasticity
consistent estimator
[pic] (9-5)
can be used to estimate the asymptotic covariance matrix of b. This result
implies that without actually specifying the type of heteroscedasticity, we
can still make appropriate inferences based on the least squares estimator.
This implication is especially useful if we are unsure of the precise
nature of the heteroscedasticity (which is probably most of the time).
A number of studies have sought to improve on the White estimator for
least squares.[2] The asymptotic properties of the estimator are
unambiguous, but its usefulness in small samples is open to question. The
possible problems stem from the general result that the squared residuals
tend to underestimate the squares of the true disturbances. [That is why we
use [pic] rather than [pic] in computing [pic].] The end result is that in
small samples, at least as suggested by some Monte Carlo studies [e.g.,
MacKinnon and White (1985)], the White estimator is a bit too optimistic;
the matrix is a bit too small, so asymptotic [pic] ratios are a little too
large. Davidson and MacKinnon (1993, p. 554) suggest a number of fixes,
which include (1) scaling up the end result by a factor [pic] and (2) using
the squared residual scaled by its true variance, [pic], instead of [pic],
where [pic].[3] (See Exercise 9.6.b.) On the basis of their study, Davidson
and MacKinnon strongly advocate one or the other correction. Their
admonition "One should never use [the White estimator] because [(2)] always
performs better" seems a bit strong, but the point is well taken. The use
of sharp asymptotic results in small samples can be problematic. The last
two rows of Table 9.1 show the recomputed standard errors with these two
modifications. Example 9.1 Heteroscedastic Regression and the White Estimator
The data in Appendix Table F7.3 give monthly credit card expenditure for
13,444 individuals. A subsample of 100 observations used here is given in
Appendix Table F9.1. The estimates ar