Module 2: Data ingestion, cataloging, and preparation

Looking for ‘Building Data Lakes on AWS Module 2 Answers’?

In this post, I provide complete, accurate, and detailed explanations for the answers to Module 2: Data ingestion, cataloging, and preparations of Course 3: Building Data Lakes on AWS

Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question!

Knowledge Check

Graded Assignment

1. Which services can be used for data ingestion into your data lake? (Select TWO.)

Amazon Kinesis Data Firehose ✅
Amazon QuickSight
Amazon Athena
AWS Storage Gateway ✅
Amazon Redshift

Explanation:

Kinesis Data Firehose is a fully managed service for streaming data into Amazon S3 and Redshift.
AWS Storage Gateway helps ingest on-premises data into AWS cloud storage like S3.

2. Which service uses continuous data replication with high availability to consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3?

AWS Storage Gateway
AWS Schema Conversion Tool (AWS SCT)
AWS Database Migration Service (AWS DMS) ✅
Amazon Kinesis Data Firehose

Explanation:
AWS DMS continuously replicates changes from source databases to targets like Amazon Redshift and Amazon S3 with high availability.

3. Which AWS Glue feature catalogs your data?

AWS Glue Crawler ✅
AWS Glue DataBrew
AWS Glue Studio
AWS Glue Streaming extract, transform, and load (ETL)

Explanation:
AWS Glue Crawlers automatically scan data and populate the Data Catalog with table definitions.

4. Your data resides in multiple data stores, including Amazon S3, Amazon RDS, and Amazon DynamoDB. You need to efficiently query the combined datasets.

Which tool can achieve this by using a single query without moving data?

Amazon Athena Federated Query ✅
Amazon Redshift Query Editor
Structured Query Language (SQL) Workbench
AWS Glue DataBrew

Explanation:
Athena Federated Query allows querying data across multiple data sources (including S3, RDS, and DynamoDB) without the need to move or copy the data.

You might also like:

Course 1: AWS Cloud Technical Essentials
Course 2: Architecting Solutions on AWS
Course 4: Exam Prep: AWS Certified Solutions Architect – Associate

Module 2: Data ingestion, cataloging, and preparation

Knowledge Check

Graded Assignment

1. Which services can be used for data ingestion into your data lake? (Select TWO.)

2. Which service uses continuous data replication with high availability to consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3?

3. Which AWS Glue feature catalogs your data?

4. Your data resides in multiple data stores, including Amazon S3, Amazon RDS, and Amazon DynamoDB. You need to efficiently query the combined datasets.

Which tool can achieve this by using a single query without moving data?

Related contents:

You might also like: