sql for data science with r coursera answers week 5

Practice Quiz

1. You are preparing to analyze some sales data and have a large amount of information in an Excel spreadsheet. You have decided to convert the Excel spreadsheet to a relational database. What is your first step?

  • Create the physical database objects.
  • Create a logical and physical database design.  
  • Get the data into the database.
  • Clean and split the data into load files.

2. What is a referential constraint?

  • This occurs when there is a primary key/foreign key relationship between two tables.  
  • This occurs when there is a many-to-many relationship between tables. 
  • This occurs when one table can’t find a matching entry in another table. 
  • This occurs when there is a one-to-many relationship between tables. 

3. Which RODBC function can you use to create a new table in a database from R?

  • sqlQuery()
  • sqlAddTable()
  • You cannot do this from R. You need to use a database management tool to do this.

4. Why is the SQL LOAD command recommended over the IMPORT command for large amounts of data?

  • The LOAD command is not recommended for large amounts of data. IMPORT is a much better choice.
  • The LOAD command bypasses the database transactional logging mechanism, making it fast and efficient. 
  • The LOAD command performs SQL INSERT statements for a group of rows and commits them periodically, saving transaction process time.  
  • The LOAD command uses transactional logging, making it fast and efficient. 

5. After you query a database, how do you load query results into a dataframe so you can perform data analysis? Select two answers.

  • Use the sqlFetch() function. 

  • Use the sqlCommit() function.
  • Use the sqlQuery() function.

  • Use the sqlLoad() function.


Graded Quiz: Creating Database Objects and Querying Data from R

6. What are two reasons to map an existing data source, like pre-existing database tables, database dump files, or raw data, to a relational database design? Select two answers.

  • Eliminate redundancy in the data.

  • Address issues with data normalization. 

  • There’s no reason to do this. It just adds a lot of extra work with little benefit. 
  • Limit the number of concurrent users to just you so it’s secure.


7. You have two tables in your database design: Customers, which lists all your customers, and Orders, which lists all the sales transactions that your customers have made over the years. The two tables each have a field called Customer_ID. Which of the following correctly describe the relationship between the two tables?

  • Both the Customer_ID field in the Customers table and the Customer_ID field in the Orders table are Foreign Keys.
  • The Customer_ID field in the Customers table is a Primary Key and the Customer_ID field in the Orders table is a Foreign Key.
  • The Customer_ID field in the Customers table is a Foreign Key and the Customer_ID field in the Orders table is a Primary Key.
  • Both the Customer_ID field in the Customers table and the Customer_ID field in the Orders table are Primary Keys.

8. What is the SQL DDL command can be used for adding primary keys to an existing table in a database?


9. What is the recommend SQL command for loading small to medium amounts of data into a database?

  • LOAD command
  • IMPORT command 
  • LOAD or IMPORT (there is no difference) 

10. What are two ways to limit database movement and increase performance when querying a database?

  • Use stored procedures when possible. 

  • Use SQL functions provided by the database vendor whenever possible. 

  • Use the sqlQuery() or sqlFetch() commands. 
  • Load all the data directly into a dataframe to reduce the number of times you must revisit the database.


Leave a Reply