the estimating the regression model by least squares estimator - NYU

Alternatively, we can allow X to be stochastic, do the analysis conditionally on the
...... Since the mean of is identically zero and its variance converges to zero, ......
Forecasting, while largely the same exercise, explicitly gives a role to ?time? and ...

Part of the document


4
THE ESTIMATING THE REGRESSION MODEL
BY LEAST
SQUARES ESTIMATOR
4.1 INTRODUCTION
Chapter 3 treated fitting the linear regression to the data by least
squares as a purely algebraic exercise. In this chapter, we will examine in
detail least squares in detail as an estimator of the model parameters of
the linear regression model (defined in Table 4.1). There are other
candidates for estimating ?. For example, we might use the coefficients
that minimize the sum of absolute values of the residuals. We begin in
Section 4.2 by considering the We begin in Section 4.2 by returning to the
question raised but not answered in Footnote 1, Chapter 3-that is, "why
should we use least squares?" We will then analyze the estimator in detail.
There are other candidates for estimating [pic]. For example, we might use
the coefficients that minimize the sum of absolute values of the residuals.
The question of which estimator to choose is based on the statistical
properties of the candidates, such as unbiasedness, consistency,
efficiency, and their sampling distributions. Section 4.3 considers finite-
sample properties such as unbiasedness. The finite-sample properties of the
least squares estimator are independent of the sample size. The linear
model is one of relatively few settings in which definite statements can be
made about the exact finite-sample properties of any estimator are known.
In most cases, the only known properties are those that apply to large
samples. Here, wWe can only approximate finite-sample behavior by using
what we know about large-sample properties. Thus, iIn Section 4.4, we will
examine the large-sample or asymptotic properties of the least squares
estimator of the regression model.[1] Section 4.5 considers robust
inference. The problem considered here is how to carry out inference when
(real) data may not satisfy the assumptions of the basic linear model.
Section 4.6 develops a method for inference based on functions of model
parameters, rather than the estimates themselves.
Discussions of the properties of an estimator are largely concerned
with point estimation-that is, in how to use the sample information as
effectively as possible to produce the best single estimate of the model
parameters. Interval estimation, considered in Section 4.7, is concerned
with computing estimates that make explicit the uncertainty inherent in
using randomly sampled data to estimate population quantities. We will
consider some applications of interval estimation of parameters and some
functions of parameters in Section 4.7. One of the most familiar
applications of interval estimation is in using the model to predict the
dependent variable and to provide a plausible range of uncertainty for that
prediction. Section 4.8 considers prediction and forecasting using the
estimated regression model.
The analysis assumes that the data in hand correspond to the
assumptions of the model. In Section 4.9, we consider several practical
problems that arise in analyzing nonexperimental data. Assumption A2, full
rank of X, is taken as a given. As we noted in Section 2.3.2, when this
assumption is not met, the model is not estimable, regardless of the sample
size. Multicollinearity, the near failure of this assumption in real-world
data, is examined in Sections 4.9.1 and 4.9.2. Missing data have the
potential to derail the entire analysis. The benign case in which missing
values are simply unexplainable random gaps in the data set is considered
in Section 4.9.3. The more complicated case of nonrandomly missing data is
discussed in Chapter 18. Finally, the problems of badly measured and
outlying observations are examined in Section 4.9.4 and 4.9.5.
Discussions of the properties of an estimator are largely concerned with
point estimation-that is, in how to use the sample information as
effectively as possible to produce the best single estimate of the model
parameters. Interval estimation, considered in Section 4.5, is concerned
with computing estimates that make explicit the uncertainty inherent in
using randomly sampled data to estimate population quantities. We will
consider some applications of interval estimation of parameters and some
functions of parameters in Section 4.5. One of the most familiar
applications of interval estimation is in using the model to predict the
dependent variable and to provide a plausible range of uncertainty for that
prediction. Section 4.6 considers prediction and forecasting using the
estimated regression model.
The analysis assumes that the data in hand correspond to the assumptions
of the model. In Section 4.7, we consider several practical problems that
arise in analyzing nonexperimental data. Assumption A2, full rank of X, is
taken as a given. As we noted in Section 2.3.2, when this assumption is not
met, the model is not estimable, regardless of the sample size.
Multicollinearity, the near failure of this assumption in real-world data,
is examined in Sections 4.7.1 toand 4.7.32. Missing data have the potential
to derail the entire analysis. The benign case in which missing values are
simply manageable random gaps in the data set is considered in Section
4.7.43. The more complicated case of nonrandomly missing data is discussed
in Chapter 18. Finally, the problem of badly measured data is examined in
Section 4.7.545. Table 4.1 Assumptions of the Classical Linear Regression Model
| |
| |
|A1. Linearity: [pic] |
|A2. Full rank: The [pic] sample data matrix, X, has full column rank. |
|A3. Exogeneity of the independent variables: [pic], [pic]. There is no |
|correlation between the disturbances and the independent variables. |
|A4. Homoscedasticity and nonautocorrelation: Each disturbance, [pic], has|
|the same finite variance, [pic], and is uncorrelated with every other |
|disturbance, [pic], conditioned on X. |
|A5. Stochastic or nonstochastic data: [pic]. |
|A6. Normal distribution: The disturbances are normally distributed. |
This chapter describes the properties of estimators. The
assumptions in Table 4.1 will provide the framework for the analysis. (The
assumptions are discussed in greater detail in Chapter 3.) For the present,
it is useful to assume that the data are a cross section of independent,
identically distributed random draws from the joint distribution of (yi,xi)
with A1-A3 which defines E[yi|xi]. Later in the text (and in Section 4.5),
we will consider more general cases. The leading exceptions, which all bear
some similarity are "stratified samples", "cluster samples," "panel data"
and "spatially correlated data." In these cases, groups of related
individual observations constitute the observational units. The time series
case in Chapters 20 and 21 will deal with data sets in which potentially
all observations are correlated. These cases will be treated later when
they are developed in more detail. Under random (cross section) sampling,
with little loss of generality, we can easily obtain very general
statistical results such as consistency and asymptotic normality. Later,
such as in Chapter 11, we will be able to accommodate the more general
cases fairly easily.
Table 4.1 Assumptions of the Classical Linear Regression Model
|A1. Linearity: [pic]. For the sample, y =X? + ?. |
|A2. Full rank: The [pic] sample data matrix, X, has full column rank for|
|every n > K. |
|A3. Exogeneity of the independent variables: [pic], [pic]. |
|There is no correlation between the disturbances and the independent |
|variables. E[?|X] = 0. |
|A4. Homoscedasticity and nonautocorrelation: Each disturbance, [pic], |
|has the same finite |
|variance; E[?i2|X] = [pic]. Every disturbance ?i is uncorrelated with |
|every other disturbance, |
|[pic], conditioned on X; E[?i?j|X] = 0. E[???|X] = ?2I. |
|A5. Stochastic or nonstochastic data: [pic]. |
|A6. Normal distribution: The disturbances, ?i, are normally distributed.|
|?|X ~ N[0,?2I] |
4.2 MOTIVATING LEAST SQUARES
Ease of computation is one reason that is occasionally offered to
motivatethat least squares is so popular. But, with modern software, ease
of computation is a minor (usually trivial) virtue. However, tThere are
several other theoretical justifications for this technique. First, least
squares is a natural approach to estimation, which makes explicit use of
the structure of the model as laid out in the assumptions. Second, even if
the true model is not a linear regression, the regression equation line fit
by least squares is an optimal linear predictor for the dependent explained
variable. Thus, it enjoys a sort of robustness that other estimators do
not. Finally, under the very specific assumptions of the classical model,
by one reasonable criterion, least squares will be the most efficient use
of the data.
We will co