chapter 6 - Danielle Carusi Machado
n = 32, R2 = .1484. (iv) The equation in part (iii) is easier to read because it
contains fewer zeros to the right of the decimal. Of course the interpretation of the
two equations is ... SOLUTIONS TO COMPUTER EXERCISES .... The t statistic on
the interaction term is about 2.13,which gives a p-value below .02 against H1: > 0
.
Part of the document
CHAPTER 6
SOLUTIONS TO PROBLEMS
6.1 The generality is not necessary. The t statistic on roe2 is only
about (.30, which shows that roe2 is very statistically insignificant.
Plus, having the squared term has only a minor effect on the slope even for
large values of roe. (The approximate slope is .0215 ( .00016 roe, and
even when roe = 25 - about one standard deviation above the average roe in
the sample - the slope is .211, as compared with .215 at roe = 0.) 6.2 By definition of the OLS regression of c0yi on c1xi1, [pic], ckxik,
i = 2, [pic], n, the[pic] solve [pic] [We obtain these from equations (3.13), where we plug in the scaled
dependent and independent variables.] We now show that if [pic] = [pic]
and [pic] = [pic], j = 1,...,k, then these k + 1 first order conditions
are satisfied, which proves the result because we know that the OLS
estimates are the unique solutions to the FOCs (once we rule out perfect
collinearity in the independent variables). Plugging in these guesses for
the [pic] gives the expressions [pic] for j = 1,2,...,k. Simple cancellation shows we can write these equations
as [pic]
and
[pic] or, factoring out constants, [pic]
and
[pic], j = 1, 2, [pic]
But the terms multiplying c0 and c0cj are identically zero by the first
order conditions for the [pic] since, by definition, they are obtained from
the regression yi on xi1, [pic], xik, i = 1,2,..,n. So we have shown that
[pic] = c0[pic] and [pic] = (c0/cj) [pic], j = 1, [pic], k solve the
requisite first order conditions. 6.3 (i) The turnaround point is given by [pic]/(2|[pic]|), or
.0003/(.000000014) [pic] 21,428.57; remember, this is sales in millions of
dollars. (ii) Probably. Its t statistic is about -1.89, which is significant
against the one-sided alternative H0: [pic] < 0 at the 5% level (cv [pic]
-1.70 with df = 29). In fact, the p-value is about .036. (iii) Because sales gets divided by 1,000 to obtain salesbil, the
corresponding coefficient gets multiplied by 1,000: (1,000)(.00030) = .30.
The standard error gets multiplied by the same factor. As stated in the
hint, salesbil2 = sales/1,000,000, and so the coefficient on the quadratic
gets multiplied by one million: (1,000,000)(.0000000070) = .0070; its
standard error also gets multiplied by one million. Nothing happens to the
intercept (because rdintens has not been rescaled) or to the R2: [pic] = 2.613 + .30 salesbil - .0070 salesbil2
(0.429) (.14) (.0037) n = 32, R2 = .1484. (iv) The equation in part (iii) is easier to read because it contains
fewer zeros to the right of the decimal. Of course the interpretation of
the two equations is identical once the different scales are accounted for. 6.4 (i) Holding all other factors fixed we have [pic] Dividing both sides by ?educ gives the result. The sign of [pic] is not
obvious, although [pic] > 0 if we think a child gets more out of another
year of education the more highly educated are the child's parents. (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the
coefficient on educ[pic]pareduc. The difference in the estimated return to
education is .00078(32 - 24) = .0062, or about .62 percentage points. (iii) When we add pareduc by itself, the coefficient on the
interaction term is negative. The t statistic on educ[pic]pareduc is about
-1.33, which is not significant at the 10% level against a two-sided
alternative. Note that the coefficient on pareduc is significant at the 5%
level against a two-sided alternative. This provides a good example of how
omitting a level effect (pareduc in this case) can lead to biased
estimation of the interaction effect. 6.5 This would make little sense. Performance on math and science exams
are measures of outputs of the educational process, and we would like to
know how various educational inputs and school characteristics affect math
and science scores. For example, if the staff-to-pupil ratio has an effect
on both exam scores, why would we want to hold performance on the science
test fixed while studying the effects of staff on the math pass rate? This
would be an example of controlling for too many factors in a regression
equation. The variable scill could be a dependent variable in an identical
regression equation. 6.6 The extended model has df = 680 - 9 = 671, and we are testing two
restrictions. Therefore, F = [(.232 - .229)/(1 - .232)](671/2) [pic] 1.31,
which is well below the 10% critical value in the F distribution with 2 and
( df: cv = 2.30. Thus, atndrte2 and ACT[pic]atndrte are jointly
insignificant. Because adding these terms complicates the model without
statistical justification, we would not include them in the final model. 6.7 The second equation is clearly preferred, as its adjusted R-squared
is notably larger than that in the other two equations. The second
equation contains the same number of estimated parameters as the first, and
the one fewer than the third. The second equation is also easier to
interpret than the third.
SOLUTIONS TO COMPUTER EXERCISES 6.8 (i) The causal (or ceteris paribus) effect of dist on price means
that [pic] ( 0: all other relevant factors equal, it is better to have a
home farther away from the incinerator. The estimated equation is [pic] = 8.05 + .365 log(dist)
(0.65) (.066) n = 142, R2 = .180, [pic] = .174, which means a 1% increase in distance from the incinerator is associated
with a predicted price that is about .37% higher. (ii) When the variables log(inst), log(area), log(land), rooms,
baths, and age are added to the regression, the coefficient on log(dist)
becomes about .055 (se [pic] .058). The effect is much smaller now, and
statistically insignificant. This is because we have explicitly controlled
for several other factors that determine the quality of a home (such as its
size and number of baths) and its location (distance to the interstate).
This is consistent with the hypothesis that the incinerator was located
near less desirable homes to begin with. (iii) When [log(inst)]2 is added to the regression in part (ii), we
obtain (with the results only partially reported) log([pic]) = -3.32 + .185 log(dist) + 2.073 log(inst) -
.1193 [log(inst)]2 + [pic]
(2.65) (.062) (0.501) (.0282) n = 142, R2 = .778, [pic] = .764. The coefficient on log(dist) is now very statistically significant, with a
t statistic of about three. The coefficients on log(inst) and [log(inst)]2
are both very statistically significant, each with t statistics above four
in absolute value. Just adding [log(inst)]2 has had a very big effect on
the coefficient important for policy purposes. This means that distance
from the incinerator and distance from the interstate are correlated in
some nonlinear way that also affects housing price.
We can find the value of log(inst) where the effect on log(price)
actually becomes negative: 2.073/[2(.1193)] [pic] 8.69. When we
exponentiate this we obtain about 5,943 feet from the interstate.
Therefore, it is best to have your home away from the interstate for
distances less than just over a mile. After that, moving farther away from
the interstate lowers predicted house price. (iv) The coefficient on [log(dist)]2, when it is added to the model
estimated in part (iii), is about -.0365, but its t statistic is only about
-.33. Therefore, it is not necessary to add this complication. 6.9 (i) The estimated equation is [pic] = .128 + .0904 educ + .0410 exper - .000714
exper2
(.106) (.0075) (.0052)
(.000116) n = 526, R2 = .300, [pic] = .296. (ii) The t statistic on exper2 is about -6.16, which has a p-value of
essentially zero. So exper is significant at the 1% level(and much smaller
significance levels). (iii) To estimate the return to the fifth year of experience, we
start at exper = 4 and increase exper by one, so (exper = 1: [pic] Similarly, for the 20th year of experience, [pic] (iv) The turnaround point is about .041/[2(.000714)] [pic] 28.7 years
of experience. In the sample, there are 121 people with at least 29 years
of experience. This is a fairly sizeable fraction of the sample. 6.10 (i) Holding exper (and the elements in u) fixed, we have [pic][pic] or
[pic] This is the approximate proportionate change in wage given one more year of
education. (ii) H0: [pic] = 0. If we think that education and experience
interact positively - so that people with more experience are more
productive when given another year of education - then [pic] > 0 is the
appropriate alternative. (iii) The estimated equation is log([pic]) = 5.95 + .0440 educ - .0215 exper +
.00320 educ[pic]exper
(0.24) (.0174) (.0200)
(.00153) n = 935, R2 = .135, [pic] = .132. The t statistic on the interaction term is about 2.13,which gives a p-value
below .02 against H1: [pic] > 0. Therefore, we reject H0: [pic] = 0
against H1: [pic] > 0 at the 2% level. (iv) We rewrite the equation as log(wage) = [pic] + [pic]educ + [pic]exper + [pic]educ(exper - 10) + u, and run the regression log(wage) on educ, exper, and educ(exper - 10). We
want the coefficient on educ. We obtain [pic][pic] .0761 and
se([pic])[pic] .0066. The 95% CI for [pic] is about .063 to .089. 6.11 (i) The estimated equation is [pic] = 997.98 + 19.81 hsize - 2.13 hsize2
(6.20) (3.99)