data analysis with r coursera week 2 quiz answers

Practice Quiz

1. The process of converting or mapping data from the initial raw form to another format to prepare it for further analysis goes by several names. What is this process commonly called? Select three answers.

  • Data pre-processing

  • Data wrangling

  • Data formatting
  • Data cleaning

2. What is the result of the following statement?

sub_airline %>% map(~sum(is.na(.)))

  • Counts the missing values in all columns in the dataset.
  • Counts all instances of zero in all columns in the dataset.
  • Counts all instances of NA in all columns in the dataset.
  • Counts the missing values and returns the result only for columns in the dataset that have missing values.

3. Which functions do you use together to correct data types in all columns of your dataset? Select two answers.

  • mutate_if()

  • sapply()
  • mutate()
  • mutate_all()

4. Which data normalization technique divides each value by the maximum value for that variable, resulting in new values that range between 0 and 1?

  • Z-score
  • Min-max 
  • Simple feature scaling

5. With data binning, observations are often organized into defined intervals called quartiles. Which quartile is the median of the dataset?

  • 4th quartile
  • 3rd quartile
  • 1st quartile
  • 2nd quartile

Graded Quiz

6. You want to access the “Date” column of a data frame called sales_data so you can perform an operation on it. What is the correct way to refer to this column?

  • sales_data.Date
  • sales_data#Date
  • sales_data$Date
  • sales_data%Date

7. Which function replaces missing values in a dataset?

  • drop_columns()
  • is.na()
  • drop_na()
  • replace_na()

8. You have a variable called “Status” that contains a status code in the format “error_type-severity_level”, for example “10-07”, and you want to reformat the column so that the “error_type” and “severity_level” are in different columns. What is the correct function to do this?

  • dataframe %>% mutate_if(Status, sep = “-“,  

                              into = c(“error_type”, “severity_level”)

  • dataframe %>% mutate_all(Status, sep = “-“,  

                               into = c(“error_type”, “severity_level”)

  • dataframe %>% separate(Status, sep = “-“,  

                             into = c(“error_type”, “severity_level”)

  • dataframe %>% sapply(Status, sep = “-“,  

                           into = c(“error_type”, “severity_level”)

9. What are two benefits of data normalization?

  • Minimize the effects of outliers, which can influence the result more. 

  • Helps you better understand data distribution. 
  • Brings data into a common standard of expression that allows you to make meaningful comparisons.
  • Enables a fair comparison between the different features and making sure they have the same impact.

10. To visualize its distribution, binned data is often plotted in which of the following type of chart?

  • Histogram
  • Scatter plot
  • Bar chart
  • Line chart

11. Which of the following can you accomplish using the spread() function? Select two answers.

  • Reformat the categorical variable that its contents are in two or more columns. 
  • Convert categorical variables to dummy variables and assign the value of another variable to each category. 

  • Size down three variables into one.
  • Convert categorical variables to dummy variables. 

Leave a Reply