neural networks and deep learning coursera week 3 quiz answers

Quiz - Shallow Neural Networks

1. Which or the tollowing are true? (Check all that apply.)

  • a^(2) (12) denotes activation vector of the 12th laver on the 2nd training example.
  • X is a matrix in which each row is one training example.
  • a^(2)(12) denotes the activation vector of the 2’d layer for the 12** training example.
  • A is a matrix In which each column is one training example.
  • a^(2) denotes the activation vector of the 2nd laver.
  • a4^(2) is the activation output by the 4th neuron of the 2nd layer
  • a4^(2) is the activation outout of the 2’d laver for the 4′ training example

2. The sigmoid function is only mentioned as an activation function for historical reasons. The tanh is always preferred without exceptions in all the layers of a Neural Network. True/False?

  • True
  • False

3. Which of these is a correct vectorized implementation of forward propagation for layer (, where 1 < 1 < L?

4. The use of the ReLU activation function is becoming more rare because the ReLU function has no derivative for = Trus/Falsei

  • True
  • False

5. Consider the following code:

#+begin_src python

x = np.random.rand(3, 2)

y = np.sum(x, axis=0, keepdims=True

#+end_src

What will be y.shape?

  • (3, 1)
  • (1, 2)
  • (3,)
  • (2,)

6. Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which or the following statements Is true!

  • Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
  • Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent, each neuron in the layer will be computing the same thing as other neurons.
  • Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in the lecture.
  • The first hidden layer’s neurons will perform different computations from each oher even in the first iteration: their parameters will thus keep evolving in their own way.

7. Logistic regression’s weights should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

  • True
  • False

8. You have built a network using the tanh activation for all the hidden units. You initialize the weights to relatively large values, using np.random.randn(..,..)*1000. What will happen?

  • So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to zero. The optimization algorithm will thus become slow.
  • This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set a(alpha) to a very small values to prevent divergence: this will slow down learning.

9. Consider the following 1 hidden layer neural network:

Which of the following statements are True? (Check all that apply).

  • b^[1] will have shape (1, 3)
  • W^[1] will have shape (3, 4).
  • W^[1] will have shape (4, 3).
  • b^[1] will have shape 3, 1).
  • *E:b^[2] will have shape (1,1)
  • b^[2] will have shape (3, 1)

10. Consider the following 1 hidden layer neural network:

What are the dimensions of Z^[1] and A^[1]?

  • Z^[1] and A^[1] are (2, 1)
  • Z^[1] and A^[1] are (2, m)
  • Z^[1] and A^[1] are (4, m)
  • Z^[1] and A^[1] are (4, 1)

Leave a Reply