Module 2: Data ingestion, cataloging, and preparation
Looking for ‘Building Data Lakes on AWS Module 2 Answers’?
In this post, I provide complete, accurate, and detailed explanations for the answers to Module 2: Data ingestion, cataloging, and preparations of Course 3: Building Data Lakes on AWS
Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question!
Knowledge Check
Graded Assignment
1. Which services can be used for data ingestion into your data lake? (Select TWO.)
- Amazon Kinesis Data Firehose ✅
- Amazon QuickSight
- Amazon Athena
- AWS Storage Gateway ✅
- Amazon Redshift
Explanation:
- Kinesis Data Firehose is a fully managed service for streaming data into Amazon S3 and Redshift.
- AWS Storage Gateway helps ingest on-premises data into AWS cloud storage like S3.
2. Which service uses continuous data replication with high availability to consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3?
- AWS Storage Gateway
- AWS Schema Conversion Tool (AWS SCT)
- AWS Database Migration Service (AWS DMS) ✅
- Amazon Kinesis Data Firehose
Explanation:
AWS DMS continuously replicates changes from source databases to targets like Amazon Redshift and Amazon S3 with high availability.
3. Which AWS Glue feature catalogs your data?
- AWS Glue Crawler ✅
- AWS Glue DataBrew
- AWS Glue Studio
- AWS Glue Streaming extract, transform, and load (ETL)
Explanation:
AWS Glue Crawlers automatically scan data and populate the Data Catalog with table definitions.
4. Your data resides in multiple data stores, including Amazon S3, Amazon RDS, and Amazon DynamoDB. You need to efficiently query the combined datasets.
Which tool can achieve this by using a single query without moving data?
- Amazon Athena Federated Query ✅
- Amazon Redshift Query Editor
- Structured Query Language (SQL) Workbench
- AWS Glue DataBrew
Explanation:
Athena Federated Query allows querying data across multiple data sources (including S3, RDS, and DynamoDB) without the need to move or copy the data.
Related contents:
Module 1: Introduction to Data Lakes
Module 3: Building a data lake with AWS Lake Formation
Module 4: Data processing and analytics
Module 5: AWS Lake Formation additional configurations and capabilities
Module 6: Modern data architecture on AWS
You might also like:
Course 1: AWS Cloud Technical Essentials
Course 2: Architecting Solutions on AWS
Course 4: Exam Prep: AWS Certified Solutions Architect – Associate