Module 4: Data processing and analytics

Looking for ‘Building Data Lakes on AWS Module 4 Answers’?

In this post, I provide complete, accurate, and detailed explanations for the answers to Module 4: Data processing and analytics of Course 3: Building Data Lakes on AWS

Whether you’re preparing for quizzes or brushing up on your knowledge, these insights will help you master the concepts effectively. Let’s dive into the correct answers and detailed explanations for each question!

Knowledge Check

Graded Assignment

1. During the data preparation stage, a data engineer enriches the raw data to support additional insights. They need to improve query performance and reduce costs of the final analytics solution.

Which data formats meet these requirements? (Select TWO.)

  • CSV
  • JSON
  • Apache Parquet ✅
  • Apache ORC ✅
  • XML

Explanation:
Both Apache Parquet and Apache ORC are columnar storage formats that:

  • Enable faster query performance
  • Reduce storage and data scanning costs
    They are highly optimized for analytics workloads compared to row-based formats like CSV or JSON.

2. A small start-up company is developing a data analytics solution. They need to cleanse and normalize large datasets, but do not have developers with the skill set to write custom scripts.

Which tool will help them efficiently design and run the data preparation activities?

  • AWS Glue Data Catalog
  • AWS Glue DataBrew ✅
  • Amazon Athena
  • AWS Glue ETL

Explanation:
AWS Glue DataBrew is a visual data preparation tool that allows users to clean and transform data without writing code—perfect for teams without programming expertise.

Leave a Reply