Top Posters
Since Sunday
5
a
5
k
5
c
5
B
5
l
5
C
4
s
4
a
4
t
4
i
4
r
4
A free membership is required to access uploaded content. Login or Register.

Ch12 Bivariate Regression.docx

Uploaded: 6 years ago
Contributor: DevonMaloy
Category: Statistics and Probability
Type: Other
Rating: N/A
Helpful
Unhelpful
Filename:   Ch12 Bivariate Regression.docx (84.04 kB)
Page Count: 7
Credit Cost: 1
Views: 131
Last Download: N/A
Transcript
Chapter 12: Bivariate Regression Chapter Objectives When you finish this chapter you should be able to calculate and test a correlation coefficient for significance. explain the OLS method and use the formulas for the slope and intercept. fit a simple regression on an Excel scatter plot. perform regression by using Excel and another package such as MegaStat. interpret confidence intervals for regression coefficients. test hypotheses about the slope and intercept by using t tests. find and interpret the coefficient of determination R2 and standard error syx. interpret the ANOVA table and use the F test for a regression. distinguish between confidence and prediction intervals. identify unusual residuals and high-leverage observations. test the residuals for non-normality, and heteroscedasticity. explain the role of data conditioning and data transformations. Quiz Yourself True/False Questions An inverse relationship between an independent variable x and a dependent variably y means that as x increases, y decreases, and vice versa. The regression line = 2 + 3x has been fitted to the data points (4,11), (2,7), and (1,5). The sum of squares for error will be 10.0. In a simple linear regression model, testing whether the slope of the population regression line could be zero is the same as testing whether or not the population coefficient of correlation equals zero. If the coefficient of correlation is –0.81, then the percentage of the variation in y that is explained by the regression line is 81%. Except for the values r = -1, 0, and 1, we cannot be specific in our interpretation of the coefficient of correlation r. However, when we square it we produce a more meaningful statistic. Given that SSE = 84 and SSR = 358.12, the coefficient of correlation (also called the coefficient of correlation) must be 0.90. The coefficient of determination is the coefficient of correlation squared. That is, . The value of the sum of squares for regression SSR can never be larger than the value of total sum of squares SST. In regression analysis, if the coefficient of determination is 1.0, then the coefficient of correlation must be 1.0. Multiple Choice Questions A simple linear regression generated a correlation coefficient of 0.01. This tells us that SSR is almost zero. SSE is almost zero. the two variables barely relate to each other. we shall reject the null at less than a 5% significance level. What randomness exists in the linear regression model? The randomness from the explanatory variables, the X's. The randomness from what is unexplained, the error. The randomness of the dependent variable, the Y's. None of the above. A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line: Sales' = 75 + 6*(Advertising). This implies that if advertising is $800, sales will be $4,875 $12,300 $123,000 $487,500 Two models were proposed for a simple regression of tree height on bark thickness, Model A: Height’ = 7.8*Bark + 37 and Model B: Height’ = 8*Bark + 35. Using the information and calculations below, which model is best? Model A: Height’ = 7.8*Bark + 37 Tree ID Height (feet) Bark Thickness (millimeters) Predicted Height Error Squared Error 1 150 15 2 175 18 177.4 -2.4 5.76 3 225 21 200.8 24.2 585.64 4 200 23 216.4 -16.4 268.96 Model 8: Height’ = 8*Bark + 35 Tree ID Height (feet) Bark Thickness (millimeters) Predicted Height Error Squared Error 1 150 15 155 -5 25 2 175 18 179 -4 16 3 225 21 203 22 484 4 200 23 A. Model A B. Model B C. The models are identical. D. It is impossible to determine the best model. The least squares line is the line guaranteed to A. be the one line of all possible lines around which the smallest square can be drawn. B. be the line of all possible lines that connects the most observations with the fewest turns. C. be the line of all possible lines that has the smallest squared sum of the distance between observations and predictions. D. be the line of all possible lines that has the smallest sum of squared distance between observations and predictions. Wasiq plans on selling his home and wishes to come up with a simple way to determine the asking price he will advertise. The following is a partial Excel output for a regression of Price (in $K) on number of rooms in the house (Rooms). Use this output to answer the next eight questions. SUMMARY OUTPUT Regression Statistics R Square Adjusted R Square Standard Error 9.97 Observations ANOVA df SS MS F Significance F Regression 1 Residual 18 1790.79 Total 19 7653.73 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 43.34 3.66 0.002 18.47 68.21 Rooms 9.97 1.3 7.68 4.39E-07 7.24 What percent of the variability in the price of a home is explained by the variability in the number of rooms? 23.4% B. 58.6% C. 76.6% D. 99.4% The predicted value for the fourth observation in the data set used for this regression is $163.01. The observed value is $146.50. What is the residual? A. -16.51 B. 16.51 C. 306.6 D. 308.51 How many homes were in Wasiq’s sample? A. 1 B. 18 C. 19 D. 20 Wasiq will only use this simple method to determine the asking price for his home if the number of rooms is an important predictor for the price of a home. Given the output from his regression, what should Wasiq do? A. Reject the null and don’t use the method. B. Reject the null and use the method. C. Fail to reject the null and don’t use the method. D. Fail to reject the null and use the method. What sign should be on the correlation coefficient for this regression? A. Positive, because the correlation coefficient is squared. B. Negative, because the standard error is small. C. Positive, because the coefficient on Rooms is positive. D. Negative, because the standard error is large. What is the margin of error for a 95% confidence interval for the coefficient on Rooms from this regression? A. 12.7 B. 7.24 C. 5.46 D. 2.73 Wasiq’s home has 7 rooms. According to the output, what should he advertise as the asking price for his home? A. $313,350 B. $113,130 C. $69,790 D. $53,310 According to the regression output, how much more could Wasiq ask if his home had two more rooms? A. $133,070 B. $19,940 C. $9,970 D. $43,300 Which of the following are ethical regression analysis behaviors? A. Remove variables from the model because the regression shows them to be non- significant. B. Remove observations from the data set which are accurate but are outliers. C. Use the significance test results of a preliminary regression to develop a model, then re- run the regression to get reportable results. D. Correct observations in the data set which are inaccurate outliers. Solved Problems from Text 12.2 a. The scatter plot shows a positive correlation between hours worked and weekly pay. b. Hours Worked (X) Weekly Pay (Y) 10 93 100 7056 840 15 171 25 36 30 20 204 0 729 0 20 156 0 441 0 35 261 225 7056 1260 20 177 350 15318 2130 SSxx SSyy SSxy =CORREL(array1,array2)= 0.919908324 c. t.025 = TINV(0.05,3) = 3.182446305 d. . We reject the null hypothesis of zero correlation. e. p-value = TDIST(4.063,3,2) = . 0.026883859 12.8 a. An increase in the price of $1, reduces its expected sales by 37.5 units. b. Sales = 842 – (20)*37.5 = 92 c. From a practical point of view no. A zero price is unrealistic. 12.10 a. Increasing the average revenue by 1 million dollars raises the net income by $30,700. b. If revenue is zero, then net income is 2277 millions dollars., suggests that the firm has net income when revenue is zero. Does not seem to be meaningful. c. Revenue = 2277 + .0307*(1000) = 2307.7 million dollars 12.16 a. Hours Worked (X) Weekly Pay (Y) 10 93 100 7056 840 15 171 25 36 30 20 204 0 729 0 20 156 0 441 0 35 261 225 7056 1260 20 177 350 15318 2130 SSxx SSyy SSxy b. ,, y = 55.286 + 6.086X c. Hours Worked (xi) Weekly Pay (yi) Estimated Pay () 10 93 116.146 -23.146 535.7373 3703.209 7056 15 171 146.576 24.424 596.5318 925.6198 36 20 204 177.006 26.994 728.676 3.6E-05 729 20 156 177.006 -21.006 441.252 3.6E-05 441 35 261 268.296 -7.296 53.23162 8334.96 7056 20 177 177.006 -0.006 3.6E-05 3.6E-05 0 20 177 2355.429 12963.79 15318 SSE SSR SST d. e. 12.22 a. Y = 7.6425 + 0.9467*X b. The 95% confidence interval is 0.9467 ± 2.145(0.0936) or (0.7460, 1.1473). c. H0: ?1 ? 0 versus H1: ?1 > 0. tcr = TINV(0.10,14) = 1.761 Reject the null hypothesis if t > 1.761. t = 10.118 so we reject the null hypothesis. d. p-value =TDIST(10.11836,14,1) = 4.03644E-08 ? 0.000 so we reject the null hypothesis. The slope is positive. 12.24 a. Y = 614.930 ? 1.09.11*X b. Intercept: t = 614.930/51.2343 = 12.002. Slope: t = ?109.112/51.3623 = ?2.124. c. df = 18, t.025 = 2.101. (Excel: =TINV(0.05,18) = 2.1009) d. Intercept: p-value =TDIST(12.002,18,2) = 5.03299E-10, Slope: p-value =TDIST(ABS(-2.124),18,2) = 0.047785001 e. (?2.124)2 = 4.51 f. This model has a poor fit. The F statistic is barely significant at a level of .05 and R2 = .2. Only 20% of the variation in units sold can be explained by average price. Quiz Yourself Answers True/False Multiple Choice 1 T 6 F 1 A 6 C 11 D 2 T 7 T 2 B 7 A 12 B 3 T 8 T 3 C 8 D 13 C 4 F 9 F 4 A 9 B 14 D 5 T 5 D 10 C

Related Downloads
Explore
Post your homework questions and get free online help from our incredible volunteers
  1253 People Browsing
Your Opinion
Who's your favorite biologist?
Votes: 586