Transcript
Chapter 13
Linear Regression and Correlation
True/False
1. If a scatter diagram shows very little scatter about a straight line drawn through the plots, it indicates a rather weak correlation.
Answer: False
2. A scatter diagram is a chart that portrays the correlation between a dependent variable and an independent variable.
Answer: True
3. An economist is interested in predicting the unemployment rate based on gross domestic product. Since the economist is interested in predicting unemployment, the independent variable is gross domestic product.
Answer: True
4. There are two variables in correlation analysis referred to as the dependent and determination variables.
Answer: False
5. Correlation analysis is a group of statistical techniques used to measure the strength of the relationship (correlation) between two variables.
Answer: True
6. The purpose of correlation analysis is to find how strong the relationship is between two variables.
Answer: True
7. Originated by Karl about 1900, the coefficient of correlation describes the strength of the relationship between two, interval or ratio-scaled variables.
Answer: True
8. The coefficient of correlation, r, is often referred to as Spearman's rho.
Answer: False
9. The coefficient of correlation r is often referred to as the product-moment correlation coefficient.
Answer: True
10. A correlation coefficient equal to –1 or +1 indicates perfect correlation.
Answer: True
11. The strength of the correlation between two variables depends on the sign of the coefficient of correlation.
Answer: False
12. A coefficient of correlation, r, close to 0 (say, 0.08) shows that the relationship between two variables is quite weak.
Answer: True
13. Correlation coefficients of –0.91 and +0.91 represent relationships between two variables that have equal strength but different directions.
Answer: True
14. A coefficient of correlation of –0.96 indicates a very weak negative correlation.
Answer: False
15. The coefficient of determination is the proportion of the total variation in the dependent variable Y that is explained or accounted for by its relationship with the independent variable X.
Answer: True
16. The coefficient of determination is found by taking the square root of the coefficient of correlation.
Answer: False
17. If the coefficient of correlation is –0.90, the coefficient of determination is –0.81.
Answer: False
18. If the coefficient of correlation is –0.50, the coefficient of determination is +0.25.
Answer: True
19. If the coefficient of correlation is 0.68, the coefficient of determination is 0.4624.
Answer: True
20. The correlation coefficient is the proportion of total variation in Y that is explained by X.
Answer: False
21. The coefficient of determination is the proportion of total variation in Y that is not explained by X.
Answer: False
22. The coefficient of determination is the proportion of total variation in Y that is explained by X.
Answer: True
23. 's product-moment correlation coefficient, r, requires that the data be interval or ratio scaled, such as incomes and weights.
Answer: True
24. The standard error of estimate measures the accuracy of our prediction.
Answer: True
25. 's coefficient of correlation can be used if the data is nominally scaled.
Answer: False
26. The coefficient of determination can only be positive.
Answer: True
27. If the coefficient of determination is expressed as a percent, its value is between 0% and 100%.
Answer: True
28. A t test is used to test the significance of the coefficient of correlation.
Answer: True
29. To test the significance of 's r, we use the standard normal z distribution.
Answer: False
30. When testing the strength of the relationship between two variables, the null hypothesis is: .
Answer: True
31. When testing the strength of the relationship between two variables, the alternate hypothesis is: .
Answer: True
32. The basic question in testing the significance of ? (rho) is to make a statistical inference about the true relationship between two variables.
Answer: True
33. One assumption underlying linear regression is that the Y values are statistically dependent. This means that in selecting a sample, the Y values chosen, for a particular X value, depend on the Y values for any other X value.
Answer: False
34. The technique used to measure the strength of the relationship between two variables using the coefficient of correlation and the coefficient of determination is called regression analysis.
Answer: False
35. A regression equation may be determined using a mathematical method called the least squares principle.
Answer: True
36. A regression equation found using the least squares principle is the best-fitting line because the sum of the squares of the vertical deviations between the actual and estimated values is minimized.
Answer: True
37. The least squares technique minimizes the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y.
Answer: True
38. The values of a and b in the regression equation are called the regression coefficients.
Answer: True
39. One assumption underlying linear regression is that for each value of X there is a group of Y values that is normally distributed.
Answer: True
40. In order to visualize the regression equation line, we can draw a scatter diagram.
Answer: True
41. A regression equation is a mathematical equation that defines the relationship between two variables.
Answer: True
42. The equation for a straight line going through the plots on a scatter diagram is called a regression equation. It is alternately called an estimating equation and a predicting equation.
Answer: True
43. The regression equation is used to estimate a value of the dependent variable Y based on a selected value of the independent variable X.
Answer: True
44. In regression analysis, the predicted value of rarely agrees exactly with the actual Y value, i.e., we expect some prediction error.
Answer: True
45. Trying to predict weekly sales with a standard error of estimate of $1,955, we would conclude that 68 percent of the predictions would not be off more than $1,955, 95 percent would not be off by more $3,910, and 99.7 percent would not be off by more than $5,865.
Answer: True
46. The standard error of estimate is used to construct confidence intervals when the sample size is large and the scatter about the regression line is somewhat normally distributed.
Answer: True
47. A confidence interval can be determined for the mean value of Y for a given value of X.
Answer: True
48. A confidence interval can be determined for the mean value of X for a given value of Y.
Answer: False
49. The smaller the samples, the smaller the standard error of estimate.
Answer: False
50. Explained variation equals total variation minus unexplained variation.
Answer: True
51. In regression analysis, there is no difference in the width of a confidence interval and the width of a predictor interval.
Answer: False
52. A confidence interval is narrower than a prediction interval because a confidence interval estimates a mean Y for a given X.
Answer: True
53. The least squares method assumes the relationship between the dependent and independent variables is linear.
Answer: True
54. When analyzing data with regression, a transformation is necessary when the relationship between the dependent and independent variables is linear.
Answer: False
55. When analyzing a curvilinear relationship between dependent and independent variables, a transformation of the data is necessary.
Answer: True
56. A mathematical transformation can be used to change a curvilinear relationship between two variables to a linear relationship.
Answer: True
Multiple Choice
57. What is the chart called when the paired data (the dependent and independent variables) are plotted?
A) Scatter diagram
B) Bar chart
C) Pie chart
D) Histogram
Answer: A
58. What is the variable used to predict the value of another called?
A) Independent
B) Dependent
C) Correlation
D) Determination
Answer: A
59. Which of the following statements regarding the coefficient of correlation is true?
A) It ranges from –1.0 to +1.0 inclusive
B) It measures the strength of the relationship between two variables
C) A value of 0.00 indicates two variables are not related
D) All of the above
Answer: D
60. What does a coefficient of correlation of 0.70 infer?
A) Almost no correlation because 0.70 is close to 1.0
B) 70% of the variation in one variable is explained by the other
C) Coefficient of determination is 0.49
D) Coefficient of nondetermination is 0.30
Answer: C
61. What is the range of values for a coefficient of correlation?
A) 0 to +1.0
B) –3 to +3 inclusive
C) –1.0 to +1.0 inclusive
D) Unlimited range
Answer: C
62. The product-moment correlation coefficient, r, requires that variables are measured with:
A) An interval scale
B) A ratio scale
C) An ordinal scale
D) A nominal
E) Either A or B.
Answer: E
63. If the correlation between two variables is close to one, the association is
A) strong.
B) moderate.
C) weak.
D) none.
Answer: A
64. If the correlation coefficient between two variables equals zero, what can be said of the variables X and Y?
A) Not related
B) Dependent on each other
C) Highly related
D) All of the above are correct
Answer: A
Scrambling: Locked
65. What can we conclude if the coefficient of determination is 0.94?
A) Strength of relationship is 0.94
B) Direction of relationship is positive
C) 94% of total variation of one variable is explained by variation in the other variable.
D) All of the above are correct
Answer: C
Scrambling: Locked
66. If r = –1.00, what inferences can be made?
A) The dependent variable can be perfectly predicted by the independent variable
B) All of the variation in the dependent variable can be accounted for by the independent variable
C) High values of one variable are associated with low values of the other variable
D) Coefficient of determination is 100%.
E) All of the above are correct
Answer: E
67. If r = 0.65, what does the coefficient of determination equal?
A) 0.194
B) 0.423
C) 0.577
D) 0.806
Answer: B
68. What does the coefficient of determination equal if r = 0.89?
A) 0.94
B) 0.89
C) 0.79
D) 0.06
Answer: C
69. Which value of r indicates a stronger correlation than 0.40?
A) –0.30
B) –0.50
C) +0.38
D) 0
Answer: B
70. What is the range of values for the coefficient of determination?
A) –1 to +1 inclusive
B) –100% to +100% inclusive
C) –100% to 0% inclusive
D) 0% to 100% inclusive
Answer: D
71. If the decision in the hypothesis test of the population correlation coefficient is to reject the null hypothesis, what can we conclude about the correlation in the population?
A) It is zero
B) It could be zero
C) It is not zero
D) It equals the computed sample correlation
Answer: C
72. A hypothesis test is conducted at the .05 level of significance to test whether or not the population correlation is zero. If the sample consists of 25 observations and the correlation coefficient is 0.60, then what is the computed value of the test statistic?
A) 1.96
B) 2.07
C) 2.94
D) 3.60
Answer: D
73. In the regression equation, what does the letter "a" represent?
A) Y intercept
B) Slope of the line
C) Any value of the independent variable that is selected
D) None of the above
Answer: A
74. In the regression equation, what does the letter "b" represent?
A) Y intercept
B) Slope of the line
C) Any value of the independent variable that is selected
D) Value of Y when X=0
Answer: B
75. Suppose the least squares regression equation is = 1202 + 1,133X. When X = 3, what does equal?
A) 5,734
B) 8,000
C) 4,601
D) 4,050
Answer: C
76. What is the general form of the regression equation?
A) = ab
B) = a + bX
C) = a – bX
D) = abX
Answer: B
77. What is the measure that indicates how precise a prediction of Y is based on X or, conversely, how inaccurate the prediction might be?
A) Regression equation
B) Slope of the line
C) Standard error of estimate
D) Least squares principle
Answer: C
78. Which of the following are true assumptions underlying linear regression: 1) for each value of X, there is a group of Y values which is normally distributed; 2) the means of these normal distributions of Y values all lie on the straight line of regression; and/or 3) the standard deviations of these normal distributions are equal?
A) Only (1) and (2)
B) Only (1) and (3)
C) Only (2) and (3)
D) All of them
Answer: D
79. Based on the regression equation, we can
A) predict the value of the dependent variable given a value of the independent variable.
B) predict the value of the independent variable given a value of the dependent variable.
C) measure the association between two variables.
D) all of the above.
Answer: A
80. Which of the following is true about the standard error of estimate?
A) It is a measure of the accuracy of the prediction
B) It is based on squared vertical deviations between Y and
C) It cannot be negative
D) All of the above
Answer: D
81. If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate?
A) –1
B) +1
C) 0
D) Infinity
Answer: C
82. In the least squares equation, = 10 + 20X the value of 20 indicates
A) the Y intercept.
B) for each unit increase in X, Y increases by 20.
C) for each unit increase in Y, X increases by 20.
D) none of the above.
Answer: B
83. In the equation = a + bX, what is ?
A) Slope of the line
B) Y intercept
C) Predicted value of Y, given a specific X value
D) Value of Y when X=0
Answer: C
84. Assume the least squares equation is = 10 + 20X. What does the value of 10 in the equation indicate?
A) Y intercept
B) For each unit increased in Y, X increases by 10
C) For each unit increased in X, Y increases by 10
D) None of the above
Answer: A
85. What is the variable used to predict another variable called?
A) Independent variable
B) Dependent variable
C) Important variable
D) Causal variable
Answer: A
86. In regression, the difference between the confidence interval and prediction interval formulas is
A) the prediction interval is the square root of the confidence interval.
B) the addition of "1" to the quantity under the radical sign.
C) the prediction interval uses and the confidence interval uses r.
D) no difference.
Answer: B
87. Which of the following is NOT a difference between a confidence interval and a prediction interval?
A) Addition of "1" under the radical for the prediction interval.
B) Confidence interval uses the standard error of estimate and the prediction interval does not.
C) Prediction interval refers to a specific case.
D) Confidence interval is narrower than the prediction interval.
Answer: B
88. When comparing the 95% confidence and prediction intervals for a given regression analysis,
A) the confidence interval is wider than a prediction interval
B) the confidence interval is narrower than a prediction interval
C) there is no difference between confidence and prediction intervals
D) None of the above
Answer: B
89. In regression analysis, a transformation is used when
A) the confidence interval is wider than a prediction interval
B) two variables are not independent
C) the relationship between dependent and independent variables is not linear
D) the correlation is near zero
Answer: C
90. The covariance is
A) the same sign as the correlation
B) always greater than zero
C) used to detect curvilinear relationships
D) computed by multiplying two variances.
Answer: A
Fill-in-the-Blank
91. In plotting paired data in a scatter diagram, on which axis is the dependent variable scaled? ___________________
Answer: Y or vertical axis
92. If we are studying the relationship between high school performance and college performance, and want to predict college performance, what kind of variable is high school performance? ________________
Answer: independent variable
93. How do we designate the sample coefficient of correlation? _____
Answer: r
94. How do we designate the population coefficient of correlation? _____
Answer: ?
95. If there is absolutely no relationship between two variables, what will 's r equal? _____
Answer: zero (0)
96. If the coefficient of correlation is 0.80, what is the coefficient of determination? ______
Answer: 0.64
97. If the coefficient of determination is 0.81, what is the coefficient of correlation? ______
Answer: 0.9 or -0.9
98. If the coefficient of correlation is –0.81, what is the coefficient of determination? _______
Answer: 0.6561
99. How is the coefficient of determination related to the correlation coefficient? ________
Answer: coefficient of determination =
100. What is the range of values that the coefficient of determination can assume? _____ and ______
Answer: 0%, 100%
101. What is the range of values that the covariance can assume?
Answer: between infinity or any value
102. A financial advisor is interested in predicting bond yield based on bond term, i.e., one year, two years, etc. What is the dependent variable? ___________________
Answer: bond yield
103. Suppose a sample of 15 homes recently sold in your area is obtained. The correlation between the area of the home, in square feet, and the selling price is 0.40. We want to test the null hypothesis that the correlation in the population is less than or equal to zero versus the alternate that it is greater than zero. The rejection region will fall in the ________ tail of a t distribution.
Answer: upper
104. If the value of r is –0.96, what does this indicate about the dependent variable as the independent variable increases? ___________
Answer: decreases
105. Perfect correlation means that the scatter diagram will appear as a _______________________.
Answer: straight line
106. What is the proportion of explained variation called? ________________
Answer: coefficient of determination
107. What is the value of the correlation coefficient if there is perfect correlation? ___________
Answer: ±1.00
108. If the correlation between sales and advertising is +0.6, what percent of the variation in sales can be attributed to advertising? ______
Answer: 36%
109. What is a chart designed to portray the relationship between a dependent variable and an independent variable called? _______________________
Answer: scatter diagram
110. What is the correlation coefficient developed by Karl formally known as? ____________________________________
Answer: product moment correlation
111. What type of correlation designates an inverse relationship between two variables? ___________________
Answer: negative
112. What is the technique used to predict or estimate the value of the dependent variable Y based on a selected value of an independent variable X called? __________________________
Answer: regression analysis
113. What is the equation used to estimate Y based on X? _____________
Answer: regression equation
114. What principle minimizes the sum of the squares of the vertical deviations about the line? _______________
Answer: least squares principle
115. The standard error of the estimate measures the scatter or dispersion of the observed values around a ____________________
Answer: regression line
116. The technique that minimizes the sum of the squared vertical deviations about the _______________________ is called "least squares."
Answer: regression line
117. What do the coefficient of correlation and the slope of the regression line always have in common? __________________
Answer: their sign
118. If the dependent variable is in dollars, the standard error is in what units? ______________
Answer: dollars ($)
119. What is the best model to describe a linear relationship between two variables? ________________________
Answer: regression equation
120. What is another name for the regression or estimating equation? ______________________________
Answer: predicting equation
121. What principle is applied when calculating the regression coefficients? _________________
Answer: principle of least squares
122. What is the general form of the regression equation? _____________
Answer: = a + bX
123. What is a measure of the scatter of observed values around the regression line called? _______________________________
Answer: standard error of estimate
124. An assumption of linear regression states that for each value of X, there is a group of Y values that are statistically __________________ and normally distributed about the regression line.
Answer: independent
125. Approximately what percent of the values lie within two standard errors of the regression line? ________
Answer: 95%
126. What is the direction of a regression line if its slope equals zero, indicating a lack of a relationship? ____________________
Answer: horizontal
127. What variation does the coefficient of determination measure? _______
Answer: explained variation relative to total variation
128. How does the prediction interval for an individual value of Y compare to the confidence interval for the mean value of Y? _______________
Answer: wider or larger
129. The regression coefficient, a, is the point where the regression line __________ the Y-axis.
Answer: intersects
130. The regression coefficient, a, is the point where the regression line __________ the Y-axis.
Answer: intersects
131. What does the covariance measure? _______
Answer: the degree of association between two variables
132. In a regression analysis every value of y is converted to z with the following formula: z = log( y ). What is the conversion called? _______
Answer: a transformation
Multiple Choice
Use the following to answer questions 133-138:
Given the following five points: (–2,0), (–1,0), (0,1), (1,1), and (2,3).
133. What is the slope of the line?
A) 0.0
B) 0.5
C) 0.6
D) 0.7
Answer: D
134. What is the Y intercept?
A) 0.0
B) 0.7
C) 1.0
D) 1.5
Answer: C
135. What is the standard error of the estimate?
A) 0
B) 0.135
C) 0.367
D) 0.606
Answer: D
136. What is the critical value necessary to determine a confidence interval for a 95% level of confidence?
A) 2.132
B) 2.353
C) 2.776
D) 3.182
Answer: D
137. What is the critical value necessary to determine a confidence interval for a 90% level of confidence?
A) 1.533
B) 1.638
C) 2.132
D) 2.353
Answer: D
138. If the regression equation is = 2 – 0.4X, what is the value of when X = –3?
A) 0.8
B) 3.2
C) –10.0
D) 14.0
Answer: B
Fill-in-the-Blank
Use the following to answer questions 139-145:
A company wants to study the relationship between an employee's length of employment and their number of workdays absent. The company collected the following information on a random sample of seven employees.
139. What is the independent variable (X)? _________________
Answer: number of years employed
140. What is the dependent variable (Y)? _____________________
Answer: number of workdays absent
141. What is the slope of the linear equation? ________________
Answer: -0.6852
142. What is the Y intercept of the linear equation? ______________
Answer: 7.7407
143. What is the least squares equation for the data? ___________________
Answer: = 7.7107 - 0.6852X
144. What is the meaning of a negative slope? _________________________
Answer: as the length of employment increases the number of work days absent decreases
145. What is the standard error of estimate? _____________
Answer: 1.31
Use the following to answer questions 146-150:
The relationship between interest rates as a percent (X) and housing starts (Y) is given by the linear equation = 4094 – 269X.
146. What will be the number of housing starts if the interest rate is 8.25%? ___________
Answer: 1875 (1874.75)
147. What will be the number of housing starts if the interest rate rose to 16%? __________
Answer: zero (0) since you can't have negative housing starts
148. At what interest rate will there be no permits for housing starts? _________
Answer: 15.22
149. What happens to housing starts as interest rates fall? ________
Answer: housing starts rise
150. For what interest rate will the maximum number of housing starts be achieved? _______
Answer: 0
Multiple Choice
Use the following to answer questions 151-160:
A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected:
151. What is the dependent variable?
A) Salesperson
B) Number of contacts
C) Amount of sales
D) All the above
Answer: C
152. What is the independent variable?
A) Salesperson
B) Number of contacts
C) Amount of sales
D) All the above
Answer: B
153. What is the Y-intercept of the linear equation?
A) –12.201
B) 2.1946
C) –2.1946
D) 12.201
Answer: A
154. What is the slope of the linear equation?
A) –12.201
B) 12.201
C) 2.1946
D) –2.1946
Answer: C
155. What is the value of the standard error of estimate?
A) 9.3104
B) 8.778
C) 8.328
D) 86.68
Answer: A
156. What is the value of the coefficient of correlation?
A) 0.6317
B) 0.9754
C) 0.9513
D) 9.3104
Answer: B
157. What is the value of the coefficient of determination?
A) 9.3104
B) 0.9754
C) 0.6319
D) 0.9513
Answer: D
158. The 95% confidence interval for 30 calls is
A) 55.8, 51.5
B) 51.4, 55.9
C) 46.7, 60.6
D) 31.1, 76.2
Answer: C
159. The 95% prediction interval for a particular person making 30 calls is
A) 55.8, 51.5
B) 51.4, 55.9
C) 46.7, 60.6
D) 31.1, 76.2
Answer: D
160. What is the regression equation?
A) = 2.1946 – 12.201X
B) = –12.201X + 2.1946X
C) = 12.201 + 2.1946X
D) = 2.1946 + 12.201X
Answer: B
Use the following to answer questions 161-167:
161. What is the standard error of the estimate?
A) 136.8552
B) 12323.56
C) 11.6985
D) Cannot be computed
Answer: C
162. What is the coefficient of determination?
A) 91.8%
B) 8.2%
C) 90.0%
D) Cannot be computed
Answer: A
163. What is the correlation coefficient?
A) 0.81
B) 0.958
C) –0.84
D) 0.006
Answer: B
164. The regression equation is:
A) = 2.179463 – 12.8094 X
B) = -12.80894 + 2.179463
C) 12.8094 X = 2.179463
D) None of the above
Answer: B
165. The regression analysis can be summarized as follows:
A) No significant relationship between the variables
B) A significant negative relationship exists between the variables
C) A significant positive relationship exists between the variables
D) For every unit increase in X, Y decreases by 12.8094
Answer: C
166. If testing the hypothesis: H0: ? = 0, the computed t – statistic is:
A) 9.45
B) 8.84
C) 8.18
D) Cannot be computed
Answer: A
167. Estimate the value of when X = 4.
A) 10.45
B) 3.73
C) 8.20
D) Cannot be computed
Answer: C
Refer To: 13_05
Use the following to answer questions 168-169:
A regression analysis yields the following information:
= 2.24 + 1.49 X
S y?x = 1.66; ?x = 32; 31.6; n = 10; ?x2 = 134
168. Compute the 95% confidence interval when X = 4.
A) 0.0, 4.05
B) 4.15, 12.25
C) 2.67, 5.33
D) 6.85, 9.55
Answer: D
169. Compute the 95% prediction interval when X = 4.
A) 0.0, 4.05
B) 2.67, 5.33
C) 4.14, 12.26
D) 6.85, 9.55
Answer: C
Essay
170. What is the purpose of measuring correlation?
Answer: To provide a measure of the linear association between two variables.
171. Select a value for the correlation coefficient and provide a complete interpretation of the correlation coefficient using your selected value.
Answer: Answers will vary depending on the value. A correlation of 0.9 (any number between -1 and +1) indicates a relatively strong (or weak) positive ( or negative) correlation. In this case, as one variable increases, the other variable also increases
172. What are the input and the output of linear regression?
Answer: As an input, linear regression requires data for two interval or ratio variables. One is defined as the dependent variable and the other is defined as the independent variable. The output of linear regression is a least squares regression equation of the form = a + b X that can be used to predict the value of the dependent variable for a selected value of the independent variable.
173. What does the coefficient of determination measure?
Answer: For a regression analysis, the coefficient of determination measures the explained variation in the dependent variable as a percent of the total variation.
174. What is the purpose of transformations in regression analysis?
Answer: One assumption of linear regression analysis is that the relationship between the dependent and independent variables is linear. Sometimes this is not true. The purpose of transformations in regression analysis is to transform the variables so they are linearly related.