regression analysis: simplify complex data relationships coursera weekly challenges 2 answers

Test your knowledge: Foundations of linear regression

1. Fill in the blank: The best fit line is the line that fits the data best by minimizing some _____.

  • residual values
  • predicted values
  • loss function
  • regression function

2. What is the sum of the squared differences between each observed value and the associated predicted value?

  • Residual least squares
  • Sum of squared residuals
  • Sum of squared predicted values
  • Ordinary least squares

3. What tool would be most effective for calculating the ordinary least squares?

  • Python
  • Google Sheets
  • SQL
  • Microsoft Excel

Test your knowledge: Assumptions and construction in Python

4. How does a data professional determine if a linearity assumption is met?

  • They confirm whether data on the X-Y coordinate falls along a downward curved line.
  • They confirm whether data on the X-Y coordinate resembles a random cloud.
  • They confirm whether data on the X-Y coordinate falls along an upward curved line.
  • They confirm whether data on the X-Y coordinate falls along a straight line.

5. Which of the following statements accurately describes the normality assumption?

  • The normality assumption can only be confirmed before a model is built.
  • The normality assumption can only be confirmed after a model is built.
  • The normality assumption can only be confirmed while a model is being built.
  • The normality assumption can be confirmed anytime during model building.

6. A data professional is using a scatterplot to plot residuals and predicted values from a regression model to check for homoscedasticity. What does this scenario represent?

  • Cone
  • Straight line
  • Random cloud
  • Curved line

7. What type of visualization uses a series of scatterplots that show the relationships between pairs of variables?

  • Residual matrix
  • Linear matrix
  • Scatterplot matrix
  • Scatterplot residuals

Test your knowledge: Evaluate a linear regression model

8. What is the area surrounding a regression line, which describes the uncertainty around the predicted outcome at every value of X?

  • Confidence interval
  • Confidence band
  • R squared
  • Ordinary least squares

9. Fill in the blank: R squared measures the _____ in the dependent variable, Y. This is explained by the independent variable, X.

  • proportion of variation
  • coefficient of variation
  • proportion of accuracy
  • coefficient of accuracy

10. Which linear regression evaluation metric is sensitive to large errors?

  • Adjusted R squared
  • Mean squared error (MSE)
  • Mean absolute error (MAE)
  • The coefficient of determination

Test your knowledge: Interpret linear regression results

11. Which of the following are best practices when communicating linear regression results? Select all that apply.

  • Provide measures of uncertainty around estimated results.
  • Always extrapolate to a larger or different group any data insights that apply only to a specific, smaller population.
  • Make the findings quickly understood without technical terms.
  • Use data visualizations to present the results.

12. Which of the following statements accurately describe coefficients and p-values for regression model interpretation? Select all that apply.

  • P-values determine how changes in the independent variables are associated with changes in the dependent variable.
  • Coefficients demonstrate whether P-values are statistically significant.
  • P-values demonstrate whether coefficients are statistically significant.
  • Coefficients determine how changes in the independent variables are associated with changes in the dependent variable.

Weekly challenge 2

13. Fill in the blank: _____ is the difference between observed values and the predicted values of a regression line.

  • Error
  • Intercept
  • Coefficient
  • Residual

14. In linear regression, what mathematical technique is used to calculate beta zero hat and beta one hat?

  • Ordinary least squares
  • Mean squared error
  • Coefficient of determination
  • Coefficient R squared

15. A data professional testing for linear regression assumptions notices that their visualization of the data appears like a random cloud. Which model assumption does this invalidate?

  • Homoscedasticity
  • Independent observation
  • Normality
  • Linearity

16. FIll in the blank: A scatterplot _____ is a series of scatterplots that show the relationships between pairs of variables.

  • progression
  • succession
  • array
  • matrix

17. A data professional checking model assumptions notices the dependent variables appear in a cone-shaped pattern when plotting the residuals against the dependant variable. Which model assumption does this invalidate?

  • Independent observation
  • Normality
  • Linearity
  • Homoscedasticity

18. Fill in the blank: A confidence band is the area surrounding a line that describes the _____ around the predicted outcome at every value of X.

  • certainty
  • accuracy
  • inaccuracy
  • uncertainty

19. What is another term for R squared?

  • Error of residuals
  • Coefficient of residuals
  • Residuals of determination
  • Coefficient of determination

20. Which of the following statements accurately describe running a randomized, controlled experiment? Select all that apply.

  • To be successful, data professionals must control for every factor in the experiment.
  • It is a study design that randomly assigns participants into groups.
  • It cannot have a control group.
  • It is typically used when arguing for causation between variables.

21. A data professional determines the best fit line by calculating the difference between observed values and the predicted value of a regression line. What is this calculation?

  • Notion
  • Coefficient
  • Residual
  • Parameter

22. A data professional minimizes the sum of squared residuals to estimate parameters in a linear regression model. What method are they using?

  • Ordinary least squares
  • Mean absolute error
  • Residual coefficients
  • R squared

23. FIll in the blank: A scatterplot matrix is a series of scatterplots that show the _____ between pairs of variables.

  • discrepancies
  • distances
  • variability
  • relationships

24. Fill in the blank: A _____ is the area surrounding a line that describes the uncertainty around the predicted outcome at every value of X.

  • confidence band
  • interval band
  • interval slope
  • confidence slope

25. What measures the proportion of variation in the dependent variable Y explained by the independent variable X?

  • Mean squared error (MSE)
  • Mean absolute error (MAE)
  • P-value
  • R squared

26. Which of the following statements accurately describe a randomized, controlled experiment? Select all that apply.

  • The differences between the control and treatment groups must be observable and measurable.
  • It is a study design that randomly assigns participants into an experimental group or a control group.
  • As the study is conducted, the only expected similarity between the control and experimental groups is the outcome variable being studied.
  • To be successful, data professionals must control for every factor in the experiment.

27. What is the difference between observed or actual values and the predicted values of a regression line?

  • Slope
  • Parameter
  • Residual
  • Beta

28. FIll in the blank: A scatterplot matrix is a series of scatterplots that show the relationships between pairs of _____.

  • coordinates
  • variables
  • models
  • lines

29. Fill in the blank: A confidence band is the area surrounding a line that describes the uncertainty around the predicted outcome at every value of _____.

  • X
  • intercept
  • slope
  • Y

30. Which of the following statements accurately describe running a randomized, controlled experiment? Select all that apply.

  • To be successful, data professionals must control for every factor in the experiment.
  • The differences between the control and treatment groups must be observable and measurable.
  • It is typically used when arguing for causation between variables.
  • It is a study design that systematically and methodically assigns participants into groups.

Leave a Reply