Test your knowledge: Input validation
11. Data professionals use input validation to ensure data is complete, error-free, and of high-quality.
- True
- False
12. Fill in the blank: If a dataset lacks sufficient information to answer a business question, the process of _____ makes it possible to augment that data by adding values from other datasets.
- summing
- ssampling
- joining
- blending
13. In which phase of the PACE workflow would a data professional perform the majority of the data-validation process?
- Execute
- Analyze
- Plan
- Construct
Weekly challenge 3
14. Which of the following terms are used to describe missing data? Select all that apply.
- Blank
- NaN
- N/A
- Zero
15. Which of the following strategies might a data professional consider when handling missing data? Select all that apply.
- Use their best judgment to add in values themselves.
- Change the missing values to zeros.
- Create a NaN category.
- Delete the missing values.
16. A data professional writes the following code:
df.merge(df_zip,
how='left',
on=['date','center_point_geom'])
df_joined.head()
Which section of the code indicates the data frame to be merged with the dataset df?
- center_point_geom
- df_joined.head()
- how=’left’
- df_zip()
17. What tasks could the pandas function pd.isnull() be used for? Select all that apply.
- To delete all of the values from a data frame
- To identify when a value is missing from a data frame
- To pull all of the missing values from a data frame
- To change all values to nulls in a data frame
18. What type of outliers are values that are completely different from the overall data group and have no association with any other outliers?
- Collective outliers
- Global outliers
- Dissimilar outliers
- Contextual outliers
Shuffle Q/A 2
19. Fill in the blank: A data professional may work with categorical data by using _____, which is a data-transformation technique where each category is assigned a unique number instead of a qualitative value.
- data blending
- partitioning
- label encoding
- aliasing
20. What type of data visualization shows the concentration of values between two data points by illustrating their magnitude with two colors?
- Density map
- Heat map
- Scatter plot
- Treemap