go beyond the numbers: translate data into insights coursera weekly challenge 3 answers

Test your knowledge: The challenge of missing or duplicate data

1. Fill in the blank: Missing data has a value that is not stored for a _____ in a dataset.

  • column
  • row
  • visualization
  • variable

2. A data professional requests additional information from a dataset’s original owner. Unfortunately, they are not able to provide the information. Therefore, the data professional creates a NaN category in the dataset. What concept does this scenario describe?

  • Ensuring two datasets are compatible
  • Managing big data
  • Mapping variables in a dataset
  • Solving the problem of missing data

3. When merging data, a data professional uses the following code:

df_joined = df.merge(df_zip, how='left',
on=['date','center_point_geom'])

What is the function of the parameters how and on in this code?

  • To tell Python how to find missing values in the rows and columns
  • To tell Python which way to join the data and which column to join from
  • To tell Python how to place the appropriate values on the top row of the dataset
  • To tell Python which datasets should be merged

4. Non-null count is the total number of blank data entries within a data column.

  • True
  • False

Test your knowledge: The ins and outs of data outliers

5. What type of outlier is a normal data point under certain conditions, but becomes an anomaly under most other conditions?

  • Collective outlier
  • Contextual outlier
  • Global outlier
  • Constant outlier

6. What is the term for a line of text that follows a method or function, which is used to explain the purpose of that method or function to others using the same code?

  • Factor
  • Annotation
  • Argument
  • Docstring

7. A data professional is using a box plot to identify suspected high outliers in a dataset, according to the interquartile rule. To do that, they search for data points greater than the third quartile plus what standard of the interquartile range?

  • 1.5 times
  • 3 times
  • 10 times
  • .5 times

Test your knowledge: Changing categorical data to numerical data

8. Fill in the blank: Label encoding assigns each category a unique _____ instead of a qualitative value.

  • character
  • qualifier
  • string
  • number

9. When working with dummy variables, data professionals may assign the variables an infinite number of values.

  • True
  • False

10. Which pandas function does a data professional use to convert categorical variables into dummy variables?

  • get_dummies()
  • convert_categories()
  • convert_dummies()
  • get_categories()

Test your knowledge: Input validation

11. Data professionals use input validation to ensure data is complete, error-free, and of high-quality.

  • True
  • False

12. Fill in the blank: If a dataset lacks sufficient information to answer a business question, the process of _____ makes it possible to augment that data by adding values from other datasets.

  • summing
  • ssampling
  • joining
  • blending

13. In which phase of the PACE workflow would a data professional perform the majority of the data-validation process?

  • Execute
  • Analyze
  • Plan
  • Construct

Weekly challenge 3

14. Which of the following terms are used to describe missing data? Select all that apply.

  • Blank
  • NaN
  • N/A
  • Zero

15. Which of the following strategies might a data professional consider when handling missing data? Select all that apply.

  • Use their best judgment to add in values themselves.
  • Change the missing values to zeros.
  • Create a NaN category.
  • Delete the missing values.

16. A data professional writes the following code:

df.merge(df_zip,

how='left',

on=['date','center_point_geom'])

df_joined.head()

Which section of the code indicates the data frame to be merged with the dataset df?

  • center_point_geom
  • df_joined.head()
  • how=’left’
  • df_zip()

17. What tasks could the pandas function pd.isnull() be used for? Select all that apply.

  • To delete all of the values from a data frame
  • To identify when a value is missing from a data frame
  • To pull all of the missing values from a data frame
  • To change all values to nulls in a data frame

18. What type of outliers are values that are completely different from the overall data group and have no association with any other outliers?

  • Collective outliers
  • Global outliers
  • Dissimilar outliers
  • Contextual outliers

19. Fill in the blank: A data professional may work with categorical data by using _____, which is a data-transformation technique where each category is assigned a unique number instead of a qualitative value.

  • data blending
  • partitioning
  • label encoding
  • aliasing

20. What type of data visualization shows the concentration of values between two data points by illustrating their magnitude with two colors?

  • Density map
  • Heat map
  • Scatter plot
  • Treemap

21. What does the pandas function pd.duplicated() return to indicate that a data value is a duplicate of another value within the same dataset?

  • False
  • Duplicate
  • True
  • Unique

22. Fill in the blank: A data professional should _____ a duplicate when its value is clearly a mistake or will misrepresent the remaining unique values within the dataset.

  • eliminate
  • filter
  • replicate
  • keep

23. Fill in the blank: N/A and NaN are terms used to describe _____ data.

  • qualitative
  • string
  • nominal
  • missing

24. Which of the following strategies might a data professional consider when handling missing data? Select all that apply.

  • Add in the values by taking the average values from the existing data.
  • Derive new representative values based on the available data.
  • Change the missing values to zeros.
  • Ask the owner of the data to fill in the missing values.

25. A data professional writes the following code:

df.merge(df_zip,

how='left',

on=['date','center_point_geom'])

df_joined.head()

Which function indicates that the first data frame should be merged with another data frame?

  • df_joined.head()
  • df.merge()
  • how=
  • on=

26. What pandas function is used to identify when a value is missing from a data frame?

  • null.pd()
  • pd.isnull()
  • null().pd
  • pd.null()

27. What type of outliers are a group of abnormal points that follow similar patterns and are isolated from the rest of a population?

  • Contextual outliers
  • Global outliers
  • Atypical outliers
  • Collective outliers

28. What pandas function enables a data professional to determine if duplicate values are present in a dataset?

  • pd.dupe()
  • pd.duplicated()
  • pd.deduplication()
  • pd.deduplicates()

29. Fill in the blank: A data professional should _____ a duplicate when its values are clearly not mistakes and should be taken into account when representing the dataset as a whole.

  • keep
  • eliminate
  • emphasize
  • filter

30. What problem does a data professional address by taking the average of available values from a dataset and using them to derive new representative values?

  • Missing data
  • Outlier values
  • Unclean data
  • Incorrect values

31. Fill in the blank: A data professional may work with categorical data by using a _____, which has a value of 0 or 1 and indicates the presence or absence of something.

  • dummy variable
  • floating point
  • variable character
  • data operator

Leave a Reply