go beyond the numbers: translate data into insights coursera weekly challenge 3 answers
Test your knowledge: The challenge of missing or duplicate data
1. Fill in the blank: Missing data has a value that is not stored for a _____ in a dataset.
- column
- row
- visualization
- variable
2. A data professional requests additional information from a dataset’s original owner. Unfortunately, they are not able to provide the information. Therefore, the data professional creates a NaN category in the dataset. What concept does this scenario describe?
- Ensuring two datasets are compatible
- Managing big data
- Mapping variables in a dataset
- Solving the problem of missing data
3. When merging data, a data professional uses the following code:
df_joined = df.merge(df_zip, how='left',
on=['date','center_point_geom'])
What is the function of the parameters how and on in this code?
- To tell Python how to find missing values in the rows and columns
- To tell Python which way to join the data and which column to join from
- To tell Python how to place the appropriate values on the top row of the dataset
- To tell Python which datasets should be merged
4. Non-null count is the total number of blank data entries within a data column.
- True
- False
Test your knowledge: The ins and outs of data outliers
5. What type of outlier is a normal data point under certain conditions, but becomes an anomaly under most other conditions?
- Collective outlier
- Contextual outlier
- Global outlier
- Constant outlier
6. What is the term for a line of text that follows a method or function, which is used to explain the purpose of that method or function to others using the same code?
- Factor
- Annotation
- Argument
- Docstring
7. A data professional is using a box plot to identify suspected high outliers in a dataset, according to the interquartile rule. To do that, they search for data points greater than the third quartile plus what standard of the interquartile range?
- 1.5 times
- 3 times
- 10 times
- .5 times
Test your knowledge: Changing categorical data to numerical data
8. Fill in the blank: Label encoding assigns each category a unique _____ instead of a qualitative value.
- character
- qualifier
- string
- number
9. When working with dummy variables, data professionals may assign the variables an infinite number of values.
- True
- False
10. Which pandas function does a data professional use to convert categorical variables into dummy variables?
- get_dummies()
- convert_categories()
- convert_dummies()
- get_categories()
Test your knowledge: Input validation
11. Data professionals use input validation to ensure data is complete, error-free, and of high-quality.
- True
- False
12. Fill in the blank: If a dataset lacks sufficient information to answer a business question, the process of _____ makes it possible to augment that data by adding values from other datasets.
- summing
- ssampling
- joining
- blending
13. In which phase of the PACE workflow would a data professional perform the majority of the data-validation process?
- Execute
- Analyze
- Plan
- Construct
Weekly challenge 3
14. Which of the following terms are used to describe missing data? Select all that apply.
- Blank
- NaN
- N/A
- Zero
15. Which of the following strategies might a data professional consider when handling missing data? Select all that apply.
- Use their best judgment to add in values themselves.
- Change the missing values to zeros.
- Create a NaN category.
- Delete the missing values.
16. A data professional writes the following code:
df.merge(df_zip,
how='left',
on=['date','center_point_geom'])
df_joined.head()
Which section of the code indicates the data frame to be merged with the dataset df?
- center_point_geom
- df_joined.head()
- how=’left’
- df_zip()
17. What tasks could the pandas function pd.isnull() be used for? Select all that apply.
- To delete all of the values from a data frame
- To identify when a value is missing from a data frame
- To pull all of the missing values from a data frame
- To change all values to nulls in a data frame
18. What type of outliers are values that are completely different from the overall data group and have no association with any other outliers?
- Collective outliers
- Global outliers
- Dissimilar outliers
- Contextual outliers
19. Fill in the blank: A data professional may work with categorical data by using _____, which is a data-transformation technique where each category is assigned a unique number instead of a qualitative value.
- data blending
- partitioning
- label encoding
- aliasing
20. What type of data visualization shows the concentration of values between two data points by illustrating their magnitude with two colors?
- Density map
- Heat map
- Scatter plot
- Treemap
21. What does the pandas function pd.duplicated() return to indicate that a data value is a duplicate of another value within the same dataset?
- False
- Duplicate
- True
- Unique
22. Fill in the blank: A data professional should _____ a duplicate when its value is clearly a mistake or will misrepresent the remaining unique values within the dataset.
- eliminate
- filter
- replicate
- keep
23. Fill in the blank: N/A and NaN are terms used to describe _____ data.
- qualitative
- string
- nominal
- missing
24. Which of the following strategies might a data professional consider when handling missing data? Select all that apply.
- Add in the values by taking the average values from the existing data.
- Derive new representative values based on the available data.
- Change the missing values to zeros.
- Ask the owner of the data to fill in the missing values.
25. A data professional writes the following code:
df.merge(df_zip,
how='left',
on=['date','center_point_geom'])
df_joined.head()
Which function indicates that the first data frame should be merged with another data frame?
- df_joined.head()
- df.merge()
- how=
- on=
26. What pandas function is used to identify when a value is missing from a data frame?
- null.pd()
- pd.isnull()
- null().pd
- pd.null()
27. What type of outliers are a group of abnormal points that follow similar patterns and are isolated from the rest of a population?
- Contextual outliers
- Global outliers
- Atypical outliers
- Collective outliers
28. What pandas function enables a data professional to determine if duplicate values are present in a dataset?
- pd.dupe()
- pd.duplicated()
- pd.deduplication()
- pd.deduplicates()
29. Fill in the blank: A data professional should _____ a duplicate when its values are clearly not mistakes and should be taken into account when representing the dataset as a whole.
- keep
- eliminate
- emphasize
- filter
30. What problem does a data professional address by taking the average of available values from a dataset and using them to derive new representative values?
- Missing data
- Outlier values
- Unclean data
- Incorrect values
31. Fill in the blank: A data professional may work with categorical data by using a _____, which has a value of 0 or 1 and indicates the presence or absence of something.
- dummy variable
- floating point
- variable character
- data operator