extract transform and load data in power bi coursera week 4 answers
Self-review: Transforming multiple data sources
In the exercise Transforming multiple data sources, you imported Excel files and transformed these multiple data sources in Power Query.
To do so, you had to complete the following tasks:
Clean and transform multiple data sources.
Join and merge them.
Examine the valid, error, empty, min, max, unique and distinct values in the rows, allowing you to identify the anomalies in the data.
Remove the data sources with anomalies.
Your final worksheet should look like this:
Now you can use the following questions to make sure that you understood and executed the tasks correctly. Don’t forget that you can revisit the previous learning items to recap the process steps.
1. What is the number of columns remaining after you remove the unnecessary ones from the OrderDetails query?
- 3
- 4
- 5
2. True or False: In the Order Details query, when you remove the anomaly values there are 27distinct, and 6unique values for the UnitPrice column.
- True
- False
3. In the OrderDetails query, how many rows are there after you remove anomalies?
- 999 rows
- 997 rows
- 996 rows
4. What is the reason for using an Inner Join for the merge operation of OrderDetails and Order?
- To keep all the rows from the left table.
- To keep only matching rows between the two table.
- To keep all the rows from the right table.
5. By removing data anomalies in the sales data, what impact does this have on the business? Select all that apply.
- It improves decision making.
- It improves integrity of the data source.
- It improves data accuracy.
Course quiz: Extract, transform and load Data in Power BI
6. Which of the following statements about data sources in Power BI is true?
- Power BI supports both cloud-based and on-premises data sources.
- Power BI can only connect to data sources that have a specific file format.
- Power BI only supports data sources that are stored in Excel spreadsheets.
7. True or False: Any data source marked as Beta or Preview has limited support and functionality. So don’t make use of it in production environments.
- True
- False
8. True or False: Local datasets in Power BI allow you to collaborate and create reports based on the same set of data. You can access and use datasets created by others within their organization, without having to create your own datasets from scratch. This ensures consistency in data analysis and reporting and saves time and effort for everyone involved.
- True
- False
9. In ________, data is stored in memory but can also be retrieved from the original data source. This is useful when you are working with dimension tables, which can be queried with fact tables from the same source.
- Import mode
- Direct Query mode
- Dual Mode
10. Which of the following data types uses a data serialization language like XML, JSON, or YAML?
- Structured data
- Unstructured data
- Semi-structured data
11. Power BI uses scheduled ________ to automate tasks at specified time intervals.
- Actions
- Triggers
- Tasks
12. True or False: Data types are defined at the row level and the determined by values of each specific row and may differ by rows.
- True
- False
13. Which of the following purposes are relevant to the Applied Steps section in the Power Query Editor? Select all that apply:
- To preview the data after the applied transformations
- To show the sequence of transformations applied to the selected query
- To display a list of all the queries in your Power BI project
- To undo a step when you make a mistake or change your mind.
14. ________ rows are instances in a dataset when two or more rows have identical values across all columns.
- Empty
- Missing
- Duplicate
15. Which of the following operations convert any data, which is organized in a wide format with separate columns for each information (region, country etc.), into a long format where the information-specific data is stacked vertically in a single column?
- Pivot
- Unpivot
- Transform
16. Which of the following menu items let you add rows from one or more tables to another query or table?
- Append Queries
- Combine Files
- Merge Queries
17. Which of the following menu items let you combine columns from one or more tables to another query or table?
- Merge Queries
- Append Queries
- Combine Files
18. In ________, all the records from the left table are included in the result set, along with the matching records from the right table.
- Left Outer Join
- Full Outer Join
- Inner join
19. In which of the join type, only the matching rows from both tables are listed?
- Inner Join
- Full Outer Join
- Left Outer Join
20. If you want to retrieve a result set that includes all records from the matching tables, even if some rows have not matched to the rows in the other tables yet, you can use a ________ between the tables.
- Left Outer Join
- Full Outer Join
- Inner join
21. True or False: The common field between the tables to be merged and used as the key field in the reference table must be unique.
- True
- False
22. Using join keys prevent difficulties that may arise from typing detailed information such as category, city or gender incorrectly or using a value that could convey the same meaning. In this way, join keys provide a crucial solution for ________ and ________.
- performance, scalability
- efficiency, scalability
- classification, categorization
23. As the data ________ process may involve large volumes of data, you carefully monitor the progress to ensure its successful completion
- Extraction
- Transformation
- Loading
24. In Adventure Works, you receive data from various channels and they cannot be used in their raw form, as they have different formats. You must transform the data and then consolidate it in a unified list. You will only use this data in the ETL process and not show it directly. What should you do?
- Use a reference query.
- Use a query parameter
- Use a staging area
25. Which of the following count gives you the total number of different values for a column in a dataset?
- Unique
- Count
- Distinct
26. ________ provides column statistics such as Minimum, Maximum, Average (Mean), Frequently Occurring Values (Mode), and Standard Deviation and in addition value distribution on the selected column.
- Column distribution
- Column quality
- Column profile
27. Which of the following gives you the total number of values in a column?
- Count
- Average
- Distinct
28. By establishing a ________, you can establish a connection between an existing query and a new query. Any modifications made to the original query will automatically propagate to the other queries, ensuring consistency and up-to-date information.
- duplicate
- query dataflow
- query reference
29. True or False: Reference queries can contribute to slow data refreshes due to their nature of referencing. When a reference query is refreshed, it needs to ensure that all the referenced queries are also refreshed to maintain data consistency.
- True
- False
30. Which of the following techniques allows you to connect to data sources, perform data transformations, and also let you publish to the Power BI Service?
- Dataflows
- Duplicate Queries
- Reference Queries
31. Which of the following techniques can be used when you want your report users to concentrate on a particular product category?
- Filters
- Data Transformation
- Dynamic Data Retrieval
32. What is the primary benefit of dynamic data retrieval in Power BI?
- To enable real-time or near real-time data analysis by fetching the latest data from the source.
- To provide historical data snapshots for reporting purposes.
- To store and import large volumes of static data for long-term analysis.
33. ________ in Power BI determine the level of data isolation between different data sources and establish secure boundaries for data interaction within your Power BI environment.
- Global options for files
- Privacy levels
- Data load options
34. Which of the best practices can increase performance in Power BI?
- Prioritize expensive operations early in the transformation process.
- Maximize the amount of data that needs to be loaded and processed.
- Filter and reduce data early in the transformation process.
35. Which of the connection types will you use when trying to connect to an on-premise SQL Server?
- SQL Server database
- SQL Server Analysis
- Services database Azure SQL database
36. True or False: The total number of columns that can be used in all the tables within a dataset is restricted to 16,000 columns. This limitation applies to the Power BI service and the datasets used in Power BI Desktop.
- True
- False
37. Which of the following connectors are supported in Power BI? Select all that apply:
- Salesforce
- SQL Server
- Google Analytics
- Excel Online
38. Which of the following item can be considered as an advantage of a local data set?
- Scalability
- Promotion
- Governance
- Data Control
39. Due to the many features in Import mode that are not supported in the DirectQuery mode, it’s not possible to switch from Import Mode to Direct Query Mode.
- True
- False
40. As a Data Analyst at Adventure Works, you have been assigned to design an inventory operations system. The data within this system needs to be quantitative, easily searchable, sortable, and suitable for analysis. Which of the following data structures would be the most suitable for fulfilling these requirements?
- Structured data
- Semi structured data
- Unstructured data
41. Which of the following is required to initiate a workflow and prompt it to run?
- Task
- Trigger
- Action
42. Which of the following options best represents the following data?
01st January 2001
- Date/Time/Timezone
- Date
- Date/Time
43. Which of the following menu items can be used to delete a specific step in Power Query transformations?
- Edit Settings
- Delete Until End
- Delete
44. As a Data Analyst at Adventure Works, you have taken over a database from one of your suppliers to integrate it into the Adventure Works relational database. However, you have noticed that the data in the source is not well organized, and some columns contain different types of data within a single column. What are the potential issues that may arise in this situation?
- Duplicate values
- Inconsistent data types
- Missing values
45. You can select ________ to create a new query or table from the appended output
- Append Queries as New
- Append Queries
- Merge Queries
46. You can select ________ to create a new query or table from the merged tables to grow the output horizontally.
- Merge Queries as New
- Append Queries
- Merge Queries
47. ________ excludes non-matching records from the result set.
- Left Outer Join
- Full Outer Join
- Inner Join
48. Which of the following statements about “using short letter codes like state codes for join keys” can be considered as true? Select all that apply:
- Easy to remember and enter
- Costs more storage space when compared to whole numbers.
- Almost the same performance when compared to whole number joins.
49. Which one is the final stage that brings all the data into the reporting interface, allowing you to filter and visualize the data based on specific criteria?
- Load
- Extract
- Transform
50. Which of the following issue can be considered an advantage of using staging in Power BI?
- Managing data effectively
- Filtering data
- Using advanced transformations
51. As a Data Analyst at Adventure Works, you have been assigned to integrate an external resource to your existing relational database. Before the operation, you need to assess valid, error and empty rows on each column, which allows you to validate your row values for all tables. Which of the following options do you need to use?
- Column quality
- Column distribution
- Column profile
52. Which of the following count gives you the total number of values that only appear once?
- Distinct
- Count
- Unique
53. True or False: Reference query will create a new query which is a copy of the original query and contains all the applied steps of the query.
- True
- False
54. ________ is designed specifically for data integration and transformation tasks, providing a self-service environment for business users to create and manage ETL (Extract, Transform, Load) processes.
- Query reference
- Dataflows
- Query duplication
55. True or False: You can use dataflows in Microsoft Power BI Desktop and Microsoft Power BI Service.
- True
- False
56. ________ enable(s) efficient data retrieval and transformation by allowing for dynamic changes, helping you cater to evolving business needs without having to rewrite entire queries.
- Parameters
- Dynamic data retrieval
- Data transformation
57. You are working in Adventure Works as a data analyst and creating a Power BI project to visualize data. You need to set privacy, data load, and file storage options for all files, and these options may change during your design and development. What should you do?
- You can set global options for the files when you first configure the environment and you can not change it after.
- You can set global options for the files when you first configure the environment and you can update the settings later if you need to change anything.
- You can set the options for the current file and repeat it for the other files respectively.
58. As a Data Analyst at Adventure Works, you are given 3 Excel files to import. After you imported the files you need to set the data types correctly in Power Query. What should you do?
- Set each data type manually in Power Query.
- Use the automatically set data types per each column.
- Check automatically set data types per each column and adjust manually if you notice an inconsistent data type.
59. You must consider the volume and complexity of your data. Some ________ may perform better with large datasets or have optimizations for specific scenarios.
- Authentication mechanisms
- Connectors
- Data sources
60. In which of the following mode(s) does Power BI send a request to the data source and get the result back? Select all that apply:
- Direct Query
- Dual
- Import
61. In Power Query, removing a step in the applied steps list may also remove all subsequent steps in the list, as they are dependent on the previous transformations.
- True
- False
62. You have an Excel file that consists of 3 columns, Month, 2022, and 2023. What do you need to do to convert 2022 and 2023 column values to the row values and create extra rows with this way by combinating month-year and values?
- Pivot Columns
- Group By
- Unpivot Columns
63. You import two Microsoft Excel tables named Product and ProductCategory into Power Query. Two tables have a common column named ProductCategoryID, and ProductCategory table contains ProductCategoryName to show the name of the categories. You want to show join the tables; but you noticed that some of the products have NULL values for the ProductCategoryID column and you want to show those products also. What should you do in this case?
- Use Left Outer Join in the Join Kind dropdown
- Use Full Outer Join in the Join Kind dropdown
- Use Inner Join in the Join Kind dropdown
64. Full Outer Join operations can potentially return very large result-sets!
- True
- False
65. Which of the following statements can be considered as true?
- Numbers require more storage space than character strings.
- Numbers are more likely to be incorrect entries
- Numeric joins are more efficient than joins of character strings.
66. ________ refers to an individual data point or a group of data points that deviates significantly from the remaining data set.
- Mode
- Outlier
- Standard deviation
67. Which of the following techniques lets you make updates in the master query, and those changes will be automatically applied to the other queries, instead of modifying transformations individually in each query?
- Dataflows
- Query duplicating
- Query referencing
68. True or false: You can use filters when connecting to a database to retrieve specific information, rather than importing the entire dataset and by this way Power BI will only fetch data for that period, saving resources and time.
- True
- False
69. Which of the following features allows you to focus on a specific category of data in your dataset?
- Filters
- Data transformation
- Dynamic data retrieval
70. True or False: Direct Query establishes a live connection to the data source, allowing real-time data analysis, while import options load the data into Power BI for offline analysis.
- True
- False