regression analysis: simplify complex data relationships coursera weekly challenges 2 answers

Test your knowledge: Foundations of linear regression

1. Fill in the blank: The best fit line is the line that fits the data best by minimizing some _____.

Answers

residual values

predicted values
loss function

regression function

2. What is the sum of the squared differences between each observed value and the associated predicted value?

Answers

Residual least squares
Sum of squared residuals

Sum of squared predicted values
Ordinary least squares

3. What tool would be most effective for calculating the ordinary least squares?

Answers

Python

Google Sheets
SQL
Microsoft Excel

Test your knowledge: Assumptions and construction in Python

4. How does a data professional determine if a linearity assumption is met?

Answers

They confirm whether data on the X-Y coordinate falls along a downward curved line.
They confirm whether data on the X-Y coordinate resembles a random cloud.
They confirm whether data on the X-Y coordinate falls along an upward curved line.

They confirm whether data on the X-Y coordinate falls along a straight line.

5. Which of the following statements accurately describes the normality assumption?

Answers

The normality assumption can only be confirmed before a model is built.
The normality assumption can only be confirmed after a model is built.

The normality assumption can only be confirmed while a model is being built.
The normality assumption can be confirmed anytime during model building.

6. A data professional is using a scatterplot to plot residuals and predicted values from a regression model to check for homoscedasticity. What does this scenario represent?

Answers

Cone

Straight line
Random cloud
Curved line

7. What type of visualization uses a series of scatterplots that show the relationships between pairs of variables?

Answers

Residual matrix
Linear matrix
Scatterplot matrix

Scatterplot residuals

Test your knowledge: Evaluate a linear regression model

8. What is the area surrounding a regression line, which describes the uncertainty around the predicted outcome at every value of X?

Answers

Confidence interval
Confidence band

R squared
Ordinary least squares

9. Fill in the blank: R squared measures the _____ in the dependent variable, Y. This is explained by the independent variable, X.

Answers

proportion of variation

coefficient of variation
proportion of accuracy
coefficient of accuracy

10. Which linear regression evaluation metric is sensitive to large errors?

Answers

Adjusted R squared
Mean squared error (MSE)
Mean absolute error (MAE)

The coefficient of determination

Test your knowledge: Interpret linear regression results

11. Which of the following are best practices when communicating linear regression results? Select all that apply.

Answers

Provide measures of uncertainty around estimated results.
Always extrapolate to a larger or different group any data insights that apply only to a specific, smaller population.

Make the findings quickly understood without technical terms.
Use data visualizations to present the results.

12. Which of the following statements accurately describe coefficients and p-values for regression model interpretation? Select all that apply.

Answers

P-values determine how changes in the independent variables are associated with changes in the dependent variable.

Coefficients demonstrate whether P-values are statistically significant.
P-values demonstrate whether coefficients are statistically significant.
Coefficients determine how changes in the independent variables are associated with changes in the dependent variable.

Weekly challenge 2

13. Fill in the blank: _____ is the difference between observed values and the predicted values of a regression line.

Answers

Error
Intercept
Coefficient

Residual

14. In linear regression, what mathematical technique is used to calculate beta zero hat and beta one hat?

Answers

Ordinary least squares
Mean squared error

Coefficient of determination
Coefficient R squared

15. A data professional testing for linear regression assumptions notices that their visualization of the data appears like a random cloud. Which model assumption does this invalidate?

Answers

Homoscedasticity

Independent observation
Normality
Linearity

16. FIll in the blank: A scatterplot _____ is a series of scatterplots that show the relationships between pairs of variables.

Answers

progression
succession
array

matrix

17. A data professional checking model assumptions notices the dependent variables appear in a cone-shaped pattern when plotting the residuals against the dependant variable. Which model assumption does this invalidate?

Answers

Independent observation
Normality

Linearity
Homoscedasticity

18. Fill in the blank: A confidence band is the area surrounding a line that describes the _____ around the predicted outcome at every value of X.

Answers

certainty

accuracy
inaccuracy
uncertainty

19. What is another term for R squared?

Answers

Error of residuals
Coefficient of residuals
Residuals of determination

Coefficient of determination

20. Which of the following statements accurately describe running a randomized, controlled experiment? Select all that apply.

Answers

To be successful, data professionals must control for every factor in the experiment.
It is a study design that randomly assigns participants into groups.

It cannot have a control group.
It is typically used when arguing for causation between variables.

21. A data professional determines the best fit line by calculating the difference between observed values and the predicted value of a regression line. What is this calculation?

Answers

Notion

Coefficient
Residual
Parameter

22. A data professional minimizes the sum of squared residuals to estimate parameters in a linear regression model. What method are they using?

Answers

Ordinary least squares
Mean absolute error
Residual coefficients

R squared

23. FIll in the blank: A scatterplot matrix is a series of scatterplots that show the _____ between pairs of variables.

Answers

discrepancies
distances

variability
relationships

24. Fill in the blank: A _____ is the area surrounding a line that describes the uncertainty around the predicted outcome at every value of X.

Answers

confidence band

interval band
interval slope
confidence slope

25. What measures the proportion of variation in the dependent variable Y explained by the independent variable X?

Answers

Mean squared error (MSE)
Mean absolute error (MAE)
P-value

R squared

26. Which of the following statements accurately describe a randomized, controlled experiment? Select all that apply.

Answers

The differences between the control and treatment groups must be observable and measurable.
It is a study design that randomly assigns participants into an experimental group or a control group.

As the study is conducted, the only expected similarity between the control and experimental groups is the outcome variable being studied.
To be successful, data professionals must control for every factor in the experiment.

27. What is the difference between observed or actual values and the predicted values of a regression line?

Answers

Slope

Parameter
Residual
Beta

28. FIll in the blank: A scatterplot matrix is a series of scatterplots that show the relationships between pairs of _____.

Answers

coordinates
variables
models

lines

29. Fill in the blank: A confidence band is the area surrounding a line that describes the uncertainty around the predicted outcome at every value of _____.

Answers

X
intercept

slope
Y

30. Which of the following statements accurately describe running a randomized, controlled experiment? Select all that apply.

Answers

To be successful, data professionals must control for every factor in the experiment.

The differences between the control and treatment groups must be observable and measurable.
It is typically used when arguing for causation between variables.
It is a study design that systematically and methodically assigns participants into groups.

Leave a Reply Cancel reply