Transcript
Chapter 13: Multiple Regression
Chapter Objectives
When you finish this chapter you should be able to
use a fitted multiple regression equation to make predictions.
interpret the R2 and perform an F test for overall significance.
test individual predictors for significance.
interpret confidence intervals for regression coefficients.
distinguish between confidence and prediction intervals.
identify unusual residuals and outliers by using standardized residuals.
interpret residual tests for leverage.
analyze the residuals to check for violations of regression assumptions.
explain the role of data conditioning and data transformations.
Quiz Yourself
True/False Questions
Multiple regression is the process of using several independent variables to predict a number of dependent variables.
For each independent variable, xi , in the multiple regression equation, the corresponding is referred to as a partial regression coefficient.
When an additional explanatory variable is introduced into a multiple regression model, coefficient of multiple determination adjusted for degrees of freedom can never decrease.
One of the consequences of multicollinearity in multiple regression is biased estimates of the slope coefficients.
From the coefficient of multiple determination, we cannot detect the strength of the relationship between the dependent variable y and any individual independent variable.
When a dummy variable is included in a multiple regression model, the interpretation of the estimated slope coefficient does not make any sense anymore.
An interaction term in a multiple regression model involving two independent variables may be used when the relationship between and y changes for differing values of .
The Durbin-Watson d statistic is used to check the assumption of normality.
The assumption of equal standard deviations about the regression line is called residual analysis.
Multiple Choice Questions
Which of the following statements about multiple regression is TRUE?
A. A multiple regression is called “multiple” because it has several data points.
B. The total sum of squares in a regression model will never exceed the regression sum of squares.
C. If we have taken into account all relevant explanatory factors, the residuals from a multiple regression should be random.
D. The coefficient of multiple determination is calculated by taking the ratio of the regression sum of squares over the total sum of squares and subtracting that value from 1.
Every now and then, Kunlakarn wishes she had used her MBA in Finance and gotten a job rather than go to graduate school. Currently, she wishes to predict the salary she would likely be offered in a starting position. The following is partial Excel output from a regression of salary package (in $K) on Finance Major (Finance =1, other major = 0), GPA (0 to 4), and Gender (female=1, male=0). Use this information to answer the next five questions.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.63
R Square
0.39
Adjusted R Square
0.3
Standard Error
9.49
Observations
24
ANOVA
df
SS
MS
F
Significance F
Regression
3
388.91
4.32
0.02
Residual
90.12
Total
23
2969.21
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Intercept
81
4.95
16.35
0
70.67
91.33
Finance
11.19
4.48
2.5
0.001
1.86
GPA
1.1
1.13
0.344
-1.27
3.47
Gender
-9.29
3.88
-2.4
0.026
-17.38
-1.21
What could Kunlakarn expect as the value of a salary package offer? Her GPA is 3.8.
A. $105,660
B. $102,580
C. $96,370
D. $87,080
The observed starting salary paid to a male who majored in marketing and had a GPA of 3.63 was $87,500. What is the error from the line of best fit for this observation?
A. $2507 B. -$6783 C. $3500 D. -$5790
The R2 for the regression is 0.39. What does this mean?
A. The ratio of personal characteristics to Salary is 0.39.
B. The ratio of Salary to personal characteristics is 0.39.
C. The ratio of the variation in Salary due to variation in personal characteristics is 0.39.
D. The ratio of the variation in personal characteristics due to variation in Salary is 0.39.
In this analysis, what does the multiple R tell us?
A. The strength of the linear relationship between Salary and the independent variables.
B. The covariance between the dependent and independent variables.
C. The overall significance of the independent variables.
D. The square root of R2.
The least squares regression line minimizes a certain value. Referring to the output above, what is that minimized value?
A. 90.12 B. 388.91 C. 1166.74 D. 1802.47
Interpret the coefficient on “Gender.”
A. Ceteris paribus, a male is paid $9,290 less on average than a female.
B. Ceteris paribus, a female is paid $9,290 less on average than a male.
C. Ceteris paribus, if a woman changes her gender, she will receive $9,290 less.
D. Ceteris paribus, for each additional dollar of salary, a female’s salary moves 9.29 units closer to a male’s salary.
Could the salary offered to finance majors be $20,000 higher than that offered to other majors?
A. Yes, and we are 95% sure of it.
B. No, and we are 95% sure of it.
C. No, the salary difference is $11,190.
D. Maybe. $20,000 is a believable possibility.
Kunlakarn knows that GPA can never be negative, so she feels that a one-tailed test of the coefficient on GPA is in order. What is the p-value associated with this test?
A. 0.172
B. 0.344
C. 0.688
D. The t statistic is required to calculate this.
As the p-value associated with the coefficient on Finance is 0.001, the null hypothesis of the hypothesis test would be
A. rejected at both ?=.01 and ?=.001.
B. not rejected at both ?=.01 and ?=.001.
C. rejected at ?=.01 but not at ?=.001.
D. rejected at ?=.001 but not at ?=.01.
Which of the following statements regarding multicollinearity is not true?
A. It is a condition that exists when the independent variables are highly correlated with the dependent variable.
B. It does not affect the F-test of the analysis of variance.
C. It exists in virtually all multiple regression models.
D. It is also called collinearity and intercorrelation.
If a group of independent variables are not significant individually but are significant as a group at a specified level of significance, this is most likely due to
A. autocorrelation
B. multicollinearity
C. the absence of binary variables
D. the presence of binary variables
In explaining the income earned by college graduates, which of the following independent variables is best represented by a dummy variable?
A. Age
B. College major
C. Grade point average
D. Number of years since graduating from high school
If the Durbin-Watson statistic has a value close to 0, which assumption is violated?
A. Independence of errors
B. Normality of the errors
C. Homoscedasticity
D. None of the above.
If the Durbin-Watson statistic, DW, has values greater than 2, this indicates
A. a positive first – order autocorrelation
B. a negative first – order autocorrelation
C. no first – order autocorrelation at all
D. None of the above.
Solved Problems from Text
13.2 a. Y = 1225 + 11.52*FloorSpace ? 6.935*CompetingAds ?0 .1496*Price
b. The coefficient of FloorSpace says that each additional square foot of floor space adds about 11.52 to sales (in thousand of dollars).
The coefficient of CompetingAds says that each additional $1000 of CompetingAds reduces about 6.935 from sales (in thousand of dollars).
The coefficient of Price says that each additional $1 of Advertised Price reduces about .1496 from net revenue (in thousand of dollars).
c. No. If all of these variables are zero, you wouldn’t sell a bike (no one will advertise a bike for zero).
d. Sales = $48.6 thousand
13.4 a. DF are 3, 26
b. F.05 = 2.61
c. F = 398802/14590 = 27.334. Yes, the overall regression is significant.
H0: All the coefficients are zero (?1 = ?2 = ?3 = 0)
H1: At least one coefficients is non-zero
d. R2 = 1196410/1575741 = .759 R2adj = 1 ? (1 ? .759)(29/265) = .731
13.6 a.
Predictor
Coef
SE
t-value
p-value
Intercept
1225.4
397.3
3.0843192
0.0034816
FloorSpace
11.522
1.33
8.6631579
0.0000
CmpetingAds
-6.935
3.905
-1.7759283
0.0825069
Price
-0.14955
0.08927
-1.6752548
0.1008207
Critical Value of t
2.779
b. t.005 = 2.779. Only Floor Space differs significantly from zero (p-value < .01 and t-value>2.779.
c. See above table
13.8 , width = 2*2.441 = 4.882
Using the quick rule ± 2SE ± 2*(1.17) = ± 2.34 , width = 2*2.34 = 4.68
The quick rule gives similar but narrower results.
Quiz Yourself Answers
True/False
Multiple Choice
1
F
6
F
1
C
6
D
11
A
2
T
7
T
2
D
7
B
12
B
3
F
8
F
3
A
8
D
13
B
4
F
9
F
4
C
9
A
14
A
5
T
5
D
10
C
15
B