sequence model coursera week 4 quiz answers

Quiz - Transformers

1. A Transformer Network processes sentences from left to right, one word at a time.

Answers

2. Transformer Network methodology is taken from: (Check all that apply)

Answers

3. What are the key inputs to computing the attention value for each word?

Answers

4. Which of the following correctly represents Attention ?

Answers

5. Are the following statements true regarding Query (Q), Key (K) and Value (V)?

Q = interesting questions about the words in a sentence

K = qualities of words given a Q

V = specific representations of words given a Q

Answers

6. What does i i represent in this multi-head attention computation?

Answers

7. Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).

What is generated from the output of the Decoder’s first block of Multi-Head Attention?

Answers

8. Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).

The output of the decoder block contains a softmax layer followed by a linear layer to predict the next word one word at a time.

Answers

9. Which of the following statements is true about positional encoding? Select all that apply.

Answers

10. Which of these is a good criterion for a good positionial encoding algorithm?

Answers

sequence model coursera week 4 quiz answers

Quiz - Transformers

1. A Transformer Network processes sentences from left to right, one word at a time.

2. Transformer Network methodology is taken from: (Check all that apply)

3. What are the key inputs to computing the attention value for each word?

4. Which of the following correctly represents Attention ?

5. Are the following statements true regarding Query (Q), Key (K) and Value (V)? Q = interesting questions about the words in a sentence K = qualities of words given a Q V = specific representations of words given a Q

6. What does i i represent in this multi-head attention computation?

7. Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).

What is generated from the output of the Decoder’s first block of Multi-Head Attention?

8. Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).

The output of the decoder block contains a softmax layer followed by a linear layer to predict the next word one word at a time.

9. Which of the following statements is true about positional encoding? Select all that apply.

10. Which of these is a good criterion for a good positionial encoding algorithm?

5. Are the following statements true regarding Query (Q), Key (K) and Value (V)?

Q = interesting questions about the words in a sentence

K = qualities of words given a Q

V = specific representations of words given a Q