convolutional neural networks coursera week 3 quiz answers
Quiz - Detection Algorithms
1. You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What should y be for the image below? Remember that "?" means "don't care", which means that the neural network loss function won't care what the neural network gives for that component of the output. Recall y = [Pc, bx, by, bh, bw, C1, C2, C3].
- y= [0,?,?,?,?,?,?,?]
- y = [1,?,?,?,?,?,??]
- y = [1,?,?,?,?,0.0,0]
- y = [?,?,?,?,?,?,?,?]
2. You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appear the same size in the image. There is at most one soft drink can in each image. Here are some typical images in your training set:
- Logistic unit, bz, by, bh (since bw = bh)
- Logistic unit (for classifying if there is a soft-drink can in the image)
- Logistic unit, bx and by
- Logistic unit, bx, by, bh, bw
3. When building a neural network that inputs a picture of a person's face and outputs N landmarks on the face (assume that the input image contains exactly one face), we need two coordinates for each landmark, thus we need 2N output units. True/False?
- True
- False
4. When training one of the object detection systems described in the lectures, each image must have zero or exactly one bounding box. True/False?
- False
- True
5. What is the IoU between these two boxes? The upper-left box is 2x2, and the lower-right box is 2x3. The overlapping region is 1x1.
- 1/9
- 1/10
- None of the above
- 1/6
6. Suppose you run non-max suppression on the predicted boxes below. The parameters you use for non-max suppression are that boxes with probability ≤ ≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?
- 5
- 4
- 7
- 6
- 3
7. Suppose you are using YOLO on a 19x19 grid, on a detection problem with 20 classes, and with 5 anchor boxes. During training, for each image you will need to construct an output volume y as the target value for the neural network; this corresponds to the last layer of the neural network. (y may include some “?”, or “don’t cares”). What is the dimension of this output volume?
- 19×19(5×25)
- 19x19x(25×20)
- 19x19x(5×20)
- 19x19x(20×25)
8. Semantic segmentation can only be applied to classify pixels of images in a binary way as 1 or 0, according to whether they belong to a certain class or not. True/False?
- False
- True
9. Using the concept of Transpose Convolution, fill in the values of X, Y and Z below.
(padding = 1, stride = 2)
- x=0, y=2, z=-1
- x=0, y=-1, z=-7
- x=0, y=-1, z=-4
- x=0, y=2, z=-7
10. When using the U-Net architecture with an input h x w× c, where c denotes the number of channels, the output will always have the shape h x wx c.True/False?
- True
- False