regression analysis: simplify complex data relationships coursera weekly challenges 2 answers
Test your knowledge: Foundations of linear regression
1. Fill in the blank: The best fit line is the line that fits the data best by minimizing some _____.
- residual values
- predicted values
- loss function
- regression function
2. What is the sum of the squared differences between each observed value and the associated predicted value?
- Residual least squares
- Sum of squared residuals
- Sum of squared predicted values
- Ordinary least squares
3. What tool would be most effective for calculating the ordinary least squares?
- Python
- Google Sheets
- SQL
- Microsoft Excel
Test your knowledge: Assumptions and construction in Python
4. How does a data professional determine if a linearity assumption is met?
- They confirm whether data on the X-Y coordinate falls along a downward curved line.
- They confirm whether data on the X-Y coordinate resembles a random cloud.
- They confirm whether data on the X-Y coordinate falls along an upward curved line.
- They confirm whether data on the X-Y coordinate falls along a straight line.
5. Which of the following statements accurately describes the normality assumption?
- The normality assumption can only be confirmed before a model is built.
- The normality assumption can only be confirmed after a model is built.
- The normality assumption can only be confirmed while a model is being built.
- The normality assumption can be confirmed anytime during model building.
6. A data professional is using a scatterplot to plot residuals and predicted values from a regression model to check for homoscedasticity. What does this scenario represent?
- Cone
- Straight line
- Random cloud
- Curved line
7. What type of visualization uses a series of scatterplots that show the relationships between pairs of variables?
- Residual matrix
- Linear matrix
- Scatterplot matrix
- Scatterplot residuals
Test your knowledge: Evaluate a linear regression model
8. What is the area surrounding a regression line, which describes the uncertainty around the predicted outcome at every value of X?
- Confidence interval
- Confidence band
- R squared
- Ordinary least squares
9. Fill in the blank: R squared measures the _____ in the dependent variable, Y. This is explained by the independent variable, X.
- proportion of variation
- coefficient of variation
- proportion of accuracy
- coefficient of accuracy
10. Which linear regression evaluation metric is sensitive to large errors?
- Adjusted R squared
- Mean squared error (MSE)
- Mean absolute error (MAE)
- The coefficient of determination
Test your knowledge: Interpret linear regression results
11. Which of the following are best practices when communicating linear regression results? Select all that apply.
- Provide measures of uncertainty around estimated results.
- Always extrapolate to a larger or different group any data insights that apply only to a specific, smaller population.
- Make the findings quickly understood without technical terms.
- Use data visualizations to present the results.
12. Which of the following statements accurately describe coefficients and p-values for regression model interpretation? Select all that apply.
- P-values determine how changes in the independent variables are associated with changes in the dependent variable.
- Coefficients demonstrate whether P-values are statistically significant.
- P-values demonstrate whether coefficients are statistically significant.
- Coefficients determine how changes in the independent variables are associated with changes in the dependent variable.
Weekly challenge 2
13. Fill in the blank: _____ is the difference between observed values and the predicted values of a regression line.
- Error
- Intercept
- Coefficient
- Residual
14. In linear regression, what mathematical technique is used to calculate beta zero hat and beta one hat?
- Ordinary least squares
- Mean squared error
- Coefficient of determination
- Coefficient R squared
15. A data professional testing for linear regression assumptions notices that their visualization of the data appears like a random cloud. Which model assumption does this invalidate?
- Homoscedasticity
- Independent observation
- Normality
- Linearity
16. FIll in the blank: A scatterplot _____ is a series of scatterplots that show the relationships between pairs of variables.
- progression
- succession
- array
- matrix
17. A data professional checking model assumptions notices the dependent variables appear in a cone-shaped pattern when plotting the residuals against the dependant variable. Which model assumption does this invalidate?
- Independent observation
- Normality
- Linearity
- Homoscedasticity
18. Fill in the blank: A confidence band is the area surrounding a line that describes the _____ around the predicted outcome at every value of X.
- certainty
- accuracy
- inaccuracy
- uncertainty
19. What is another term for R squared?
- Error of residuals
- Coefficient of residuals
- Residuals of determination
- Coefficient of determination
20. Which of the following statements accurately describe running a randomized, controlled experiment? Select all that apply.
- To be successful, data professionals must control for every factor in the experiment.
- It is a study design that randomly assigns participants into groups.
- It cannot have a control group.
- It is typically used when arguing for causation between variables.
21. A data professional determines the best fit line by calculating the difference between observed values and the predicted value of a regression line. What is this calculation?
- Notion
- Coefficient
- Residual
- Parameter
22. A data professional minimizes the sum of squared residuals to estimate parameters in a linear regression model. What method are they using?
- Ordinary least squares
- Mean absolute error
- Residual coefficients
- R squared
23. FIll in the blank: A scatterplot matrix is a series of scatterplots that show the _____ between pairs of variables.
- discrepancies
- distances
- variability
- relationships
24. Fill in the blank: A _____ is the area surrounding a line that describes the uncertainty around the predicted outcome at every value of X.
- confidence band
- interval band
- interval slope
- confidence slope
25. What measures the proportion of variation in the dependent variable Y explained by the independent variable X?
- Mean squared error (MSE)
- Mean absolute error (MAE)
- P-value
- R squared
26. Which of the following statements accurately describe a randomized, controlled experiment? Select all that apply.
- The differences between the control and treatment groups must be observable and measurable.
- It is a study design that randomly assigns participants into an experimental group or a control group.
- As the study is conducted, the only expected similarity between the control and experimental groups is the outcome variable being studied.
- To be successful, data professionals must control for every factor in the experiment.
27. What is the difference between observed or actual values and the predicted values of a regression line?
- Slope
- Parameter
- Residual
- Beta
28. FIll in the blank: A scatterplot matrix is a series of scatterplots that show the relationships between pairs of _____.
- coordinates
- variables
- models
- lines
29. Fill in the blank: A confidence band is the area surrounding a line that describes the uncertainty around the predicted outcome at every value of _____.
- X
- intercept
- slope
- Y
30. Which of the following statements accurately describe running a randomized, controlled experiment? Select all that apply.
- To be successful, data professionals must control for every factor in the experiment.
- The differences between the control and treatment groups must be observable and measurable.
- It is typically used when arguing for causation between variables.
- It is a study design that systematically and methodically assigns participants into groups.