Week 3: Cleaning data with SQL
1. Fill in the blank: Data analysts usually use _____ to deal with very large datasets.
- web browsers
- spreadsheets
- SQL
- word processors
2.What are some of the benefits of using SQL for analysis? Select all that apply.
- SQL interacts with database programs.
- SQL tracks changes across a team.
- SQL has built-in functionalities.
- SQL can pull information from different database sources.
3. A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS
- UPDATE
- INSERT INTO
4. You are working with a database table that contains invoice data. The table includes columns for invoice_id and customer_id. You want to remove duplicate entries for customer ID and sort the results by invoice ID.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What customer ID number appears in row 12 of your query result?
- 23
- 42
- 16
- 8
5. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for postal codes that are greater than 7 characters long.
You write the SQL query below. Add a LENGTH function that will return any postal codes that are greater than 7 characters long.
What is the last name of the customer that appears in row 10 of your query result?
- Rocha
- Brooks
- Hughes
- Ramos
6. A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe?
- Visualizing
- Processing
- Calculating
- Typecasting
7. The CAST function can be used to convert the DATE datatype to the DATETIME datatype.
- True
- False
8. Fill in the blank: The _____ function can be used to return non-null values in a list.
- TRIM
- COALESCE
- CAST
- CONCAT
9. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You want to retrieve the first 3 characters of each postal code. You decide to use the SUBSTR function to retrieve the first 3 characters of each postal code, and use the AS command to store the result in a new column called new_postal_code.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each postal code and store the result in a new column as new_postal_code.
NOTE: The three dots (...) indicate where to add the statement.
What employee ID number appears in row 5 of your query result?
NOTE: The query index starts at 1 not 0.
- 3
- 1
- 8
- 7
10. Why do data analysts choose to work with SQL? Select all that apply.
- SQL can handle huge amounts of data.
- SQL is a powerful software program.
- SQL is a well-known standard in the professional community.
- SQL is a programming language that can also create web apps.
11. A team of data analysts is working on a large project that will take months to complete and contains a huge amount of data. They need to document their process and communicate with multiple databases. The team decides to use a SQL server as the main analysis tool for this project and SQL for the queries. What makes this the most efficient tool? Select all that apply.
- SQL efficiently handles large amounts of data.
- SQL records queries and changes throughout a project.
- SQL contains commands that build visualizations.
- SQL allows you to connect to multiple databases.
12. Fill in the blank: _____ refers to the process of converting data from one type to another.
- Formatting
- Cleaning
- Typecasting
- Querying
13. A data analyst is working with product sales data. They import new data into a database. The database recognizes the data for product price as text strings. What SQL function can the analyst use to convert text strings to floats?
- LENGTH
- TRIM
- SUBSTR
- CAST
14. Fill in the blank: The _____ function can be used to join strings to create a new column.
- CAST
- COALESCE
- TRIM
- CONCAT
15. As a data analyst, you are working on a quick project containing a small amount of data. As the data was emailed to you, there is no need to query the data. What tool should you use to perform your analysis?
- Spreadsheet
- SQL
- word process
- CSV
16. A data analyst has added a massive table to their database on accident and needs to remove the table. What command can the analyst use to correct their mistake?
- DROP TABLE IF NOT EXISTS
- INSERT INTO
- REMOVE TABLE IF EXISTS
- DROP TABLE IF EXISTS
17. You are working with a database table that contains invoice data. The table includes a column for customer_id. You want to remove duplicate entries for customer_id and get a count of total customers in the database.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What is the total number of customers in the database?
- 84
- 105
- 43
- 59
18. In SQL databases, what data type refers to a number that does not contain a decimal?
- String
- Integer
- Boolean
- Float
19. After joining multiple tables you find your data contains a significant amount of null values. What function can you use to return only the non-null values in a list ?
- CAST
- COALESCE
- TRIM
- CONCAT
20. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, and country. The state names are abbreviated. You want to retrieve the first 2 letters of each state name. You decide to use the SUBSTR function to retrieve the first 2 letters of each state name, and use the AS command to store the result in a new column called new_state.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 2 letters of each state name and store the result in a new column as new_state.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What customer ID number is in row 9 of your query result?
NOTE: The query index starts at 1 not 0.
- 3
- 55
- 47
- 10
21. A junior data analyst joins a new company. The analyst learns that SQL is heavily utilized within the organization. Why would the organization choose to invest in SQL? Select all that apply.
- SQL is a programming language that can also create web apps.
- SQL can handle huge amounts of data.
- SQL is a powerful software program.
- SQL is a well-known standard in the professional community.
22. You are working with a database table that contains invoice data. The table includes columns for invoice_id and billing_state. You want to remove duplicate entries for billing state and sort the results by invoice ID.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_state column.
NOTE: The three dots (...) indicate where to add the clause.
What billing state appears in row 17 of your query result?
NOTE: The query index starts at 1 not 0.
- AZ
- NV
- CA
- WI
23. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for city names that are greater than 9 characters long.
You write the SQL query below. Add a LENGTH function that will return any city names that are greater than 9 characters long.
What is the first name of the customer that is in row 7 of your query result?
NOTE: The query index starts at 1 not 0.
- Diego
- Kara
- Julia
- Roberto
24. In SQL databases, what data type refers to a number that contains a decimal?
- Boolean
- Float
- Integer
- String
25. You’re working with a dataset that contains a float column with a significant amount of decimal places. This level of granularity is not needed for your current analysis. How can you convert the data in the float column to be integer data?
- CAST
- COALESCE
- TRIM
- CONCAT
26. What SQL function lets you add strings together to create new text strings that can be used as unique keys?
- CAST
- COALESCE
- TRIM
- CONCAT
27. What are some of the benefits of using SQL for analysis? Select all that apply.
- SQL interacts with database programs.
- SQL has better user management than spreadsheets.
- SQL can pull information from different database sources.
- SQL tracks changes across a team.
28. A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database?
- UPDATE
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS
- INSERT INTO
29. In SQL databases, True/False values refers to what data type?
- String
- Float
- Integer
- Boolean
30. A data analyst is tasked with identifying what orders are still in transit. The current list of orders contains trillions of rows. What is the best tool for the analyst to use?
- Spreadsheets
- CSV
- SQL
- Word processor
31. Your manager tasks you with analyzing a dataset and visually inspecting the data. Upon initial inspection you realize that this is a small dataset. What tool should you use to analyze the data?
- CSV
- Spreadsheet
- SQL
- Word processor
32. A data analyst creates a database to store information on the company's customer data. When completing the initial import the analyst notices that they forgot to add a few customers into the table. What command can the analyst use to add these missed customers?
- ADD
- APPEND
- INSERT INTO
- DROP
33. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to find what state names are greater than 3 characters.
You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 3 characters long.
What state is in row 1 of your query result?
NOTE: The query index starts at 1 not 0.
- India
- Chile
- Dublin
- Ireland
34. In SQL databases, what function can be used to convert data from one datatype to another?
- CAST
- LENGTH
- TRIM
- SUBSTR
35. After a company merger, a data analyst receives a dataset with billions of rows of data. They need to leverage this data to identify insights for upper management. What tool would be most efficient for the analyst to use?
- Spreadsheet
- Word processor
- SQL
- CSV
36. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. The state names are abbreviated. You want to check for state names that are greater than 2 characters long.
You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 2 characters long.
What country is in row 1 of your query result?
NOTE: The query index starts at 1 not 0.
- Ireland
- India
- France
- Chile
37. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You use the SUBSTR function to retrieve the first 3 characters of each last_name, and use the AS command to store the result in a new column called new_last_name.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each last_name and store the result in a new column as new_last_name.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What employee ID number is in row 8 of your query result?
NOTE: The query index starts at 1 not 0.
- 7
- 3
- 1
- 8
38. A data analyst runs a SQL query to extract some data from a database for further analysis. How can the analyst save the data? Select all that apply.
- Download the data as a spreadsheet.
- Use the UPDATE query to save the data.
- Create a new table for the data.
- Run a SQL query to automatically save the data.
39. A data analyst notices their Boolean column is incorrectly storing True/False values as strings. What function can the analyst use to convert the data type from a string to Boolean?
- CAST
- TRIM
- LENGTH
- SUBSTR
40. A data analyst is cleaning survey data. The results for an optional question contain many nulls. What function can the analyst use to eliminate the null values from the results?
- COALESCE
- CONCAT
- CAST
- LENGTH
41. A data analyst is analyzing medical data for a health insurance company. The dataset contains billions of rows of data. Which of the following tools will handle the data most efficiently?
- A word processor
- CSV
- A spreadsheet
- SQL
42. You are working with a database table that contains invoice data. The table includes columns for invoice_id and billing_city. You want to remove duplicate entries for billing_city and sort the results by invoice_id.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_city column.
NOTE: The three dots (...) indicate where to add the clause.
What billing city appears in row 15 of your query result?
NOTE: The query index starts at 1 not 0.
- London
- Reno
- Santiago
- Oslo
43. You are working with a database table that contains invoice data. The table includes columns about billing location such as billing_city, billing_state, and billing_postal_code. You use the SUBSTR function to retrieve the first 4 numbers of each billing_postal_code, and use the AS command to store the result in a new column called new_postal_code.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 4 numbers of each billing postal code and store the result in a new column as new_postal_code.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What invoice id is located in row 4?
NOTE: The query index starts at 1 not 0.
- 32
- 104
- 52
- 206
44. In which of the following situations would a data analyst use SQL instead of a spreadsheet? Select all that apply.
- When quickly pulling information from many different sources in a database
- When recording queries and changes throughout a project
- When working with a huge amount of data
- When using the COUNTIF function to find a specific piece of information
45. You are working with a database table named invoice that contains invoice data. The table includes columns for customer_id and total. You want to remove duplicate customers and identify which unique customers have a total greater than 5.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What customer_id number is located in row 5?
NOTE: The query index starts at 1 not 0.
- 57
- 14
- 52
- 4
46. You are working with a database table named customer that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check what country names are greater than 7 characters.
You write the SQL query below. Add a LENGTH function that will return any country names that are greater than 7 characters long.
NOTE: The three dots (...) indicate where to add the clause.
What is the value of postal_code in row 1 of your query result? (Hint: you will have to scroll to the right with your mouse or track pad to locate the indicated column.)
NOTE: The query index starts at 1 not 0.
- 14700
- None
- 2010
- N1 5LH