Module 6: Modern data architecture on AWS

Looking for ‘Building Data Lakes on AWS Module 6 Answers’?

In this post, I provide complete, accurate, and detailed explanations for the answers to Module 6: Modern data architecture on AWS of Course 3: Building Data Lakes on AWS

Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question!

Post-Assessment

Graded Assignment

1.If you are querying data with Amazon Athena, you can use AWS Lake Formation to simplify how you secure and connect to your data from Amazon QuickSight. After Lake Formation is configured, you can use Amazon QuickSight to access databases and tables.

Which method can you use to access databases or tables through SQL queries? (Select TWO.)

  • Use the AWS Glue service used for building the data lake.
  • Use the Amazon Athena Console, the AWS CLI, or your favorite query editor. ✅
  • Use the full-featured editor provided by Amazon QuickSight where you can write SQL queries. ✅
  • Use the reports generated by Amazon Athena and parse it in a relational DB for running SQL queries.
  • Use the AWS Lambda function to execute SQL queries on the data in the Lake Formation data lake.

Explanation:
Amazon Athena allows users to query data in Amazon S3 using SQL. You can either use Athena’s console, CLI, or external editors for querying. Additionally, Amazon QuickSight allows direct querying through its SQL interface.

2. Which AWS Glue feature can determine the schema of your data?

  •  Glue Job
  • Classifier✅
  • Development Endpoint
  • Crawler

Explanation:
Classifier: Helps a crawler determine the schema of data, especially for custom formats, but doesn’t operate on its own to detect schemas.

3. You need to author an ETL job in AWS Glue. Your datasets are in Amazon S3. Your goal is to filter, join, and aggregate two different datasets. What tool can you use to create and run AWS Glue job scripts that could help you achieve this goal? (Select TWO.)

  • AWS Glue Script Editor ✅
  • AWS Glue Crawler
  • AWS Glue Triggers
  • AWS Glue Studio
  • AWS Glue Data Catalog

Explanation:
AWS Glue Script Editor is tools for authoring, editing, and running ETL job scripts for data transformation tasks.

4. Which combination of AWS services form the storage layer for a data lake in AWS?

  • Amazon S3, AWS Glue, Amazon DynamoDB
  • Amazon S3, Amazon RDS, Amazon Redshift ✅
  • Amazon RDS, Amazon DynamoDB, Amazon EMR
  • Amazon EC2, Amazon RDS, Amazon DynamoDB

Explanation:
A data lake storage layer typically uses Amazon S3 for raw data storage. However, Amazon RDS (for relational databases) and Amazon Redshift (for analytics) can also be used for structured data integration.

5. Shirley is a data administrator who needs to set up a data lake for their company, AnyCompany. Currently, all data is stored in Amazon S3. John is a marketing manager and needs write access to customer purchasing information. Shirley registers the Amazon S3 path containing customer purchasing information with AWS Lake Formation and grants John access to the Amazon S3 path. John tries to create a database and gets an error. What could be the possible issue here?

  • A bucket policy is preventing John from creating a database
  • John is missing data lake administrator permissions.
  • Shirley also needs to grant John permission to create databases ✅
  • Shirley doesn’t have sufficient permission for accessing Amazon S3

Explanation:
While John has access to the S3 path, Shirley needs to explicitly grant John the permission to create databases in AWS Lake Formation.

6. AWS Lake Formation provides machine learning capabilities to create custom transforms to cleanse your data. There is currently one available transform named FindMatches. You can create these transforms when you create a job. After the transform, where is the data stored?

  • A user defined data source
  • User selected location in S3
  • Amazon SageMaker
  • AWS Glue ✅

Explanation:
After the transformation, the data is stored in a user-defined location, typically in Amazon S3.

7. While creating a blueprint in AWS Lake Formation, John received the following error: "User: is not authorized to perform: iam:PassRole on resource:"

How can this be fixed? (Select TWO.)

  • John’s account is disabled. Enable the user account.
  • Grant John Read Only permission on the S3 data source
  • Ask John to choose a different role with the required passrole permissions ✅
  • Update John’s IAM policy to be able to pass the role ✅
  • Ask John to choose a different AWS Region

Explanation:
John’s IAM policy needs to have the iam:PassRole permission in order to pass the role during blueprint creation. Either he needs to select a role with the right permissions or have his policy updated.

8. A customer is planning to schedule and implement a complex multi-job extract, transform and load (ETL) activity. They want to track the activity as a single entity. How should the customer proceed?

  • Use an Administrator IAM User and use that account for scheduling the ETL Job
  • Use a database in the data lake’s Data Catalog and store the ETL Job information in it
  • Use a workflow that defines the data source and schedule to import data into your data lake ✅
  • Use manually created Glue crawlers and run it on demand or on a schedule

Explanation:
A workflow allows for scheduling, organizing, and tracking multi-job ETL tasks in AWS Glue as a single entity.

9. In which scenario would a data scientist use AWS Glue jobs? (Select TWO.)

  • Analyzing data in real time as data comes into the data lake
  • Transforming data in near real time as data comes into the data lake ✅
  • Analyzing data in batches on schedule or on demand
  • Transforming data in batches on schedule or on demand ✅
  • Developing machine learning models

Explanation:
A data scientist would use AWS Glue jobs to transform or analyze data in batches, whether on a schedule or on demand.

10. What is the AWS Glue Data Catalog?

  • A fully managed extract, transform, and load (ETL) pipeline service
  • A service to schedule jobs
  • A visual data preparation tool
  • An index to the location, schema, and runtime metrics of your data ✅

Explanation:
The AWS Glue Data Catalog acts as a central repository that indexes metadata, location, and schema for your data.

Leave a Reply