Coursera Deep Learning 1 Нейронные сети и Deep Learning Week 3 Проблемы

1

Which of the following are true? (Check all that apply.)

2

The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

True
правильный
Yes. As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

False

3

Что из этого является правильной векторизованной реализацией прямого распространения для уровня l, где 1≤l≤L?

4

You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

ReLU

Leaky ReLU

sigmoid
правильный
Yes. Sigmoid outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. It can be done with tanh as well but it is less convenient as the output is between -1 and 1.

tanh

5

Consider the following code:

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

Какой будет B.shape? (Если вы не уверены, не стесняйтесь запустить это в python, чтобы узнать).

(1, 3)

(4, 1)
правильный
Yes, we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more rigorous.

(, 3)

(4, )

6

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
правильный

Каждый нейрон в первом скрытом слое будет выполнять одни и те же вычисления в первой итерации, но после одной итерации градиентного спуска они научатся вычислять разные вещи, потому что у нас «нарушенная симметрия».

Каждый нейрон в первом скрытом слое будет вычислять одно и то же, но нейроны в разных слоях будут вычислять разные вещи, поэтому мы выполнили «нарушение симметрии», как описано в лекции.

Нейроны первого скрытого слоя будут выполнять отличные друг от друга вычисления даже на первой итерации, поэтому их параметры будут продолжать развиваться по-своему.

7

Веса логистической регрессии w должны быть инициализированы случайным образом, а не всеми нулями, потому что, если вы инициализируете все нули, тогда логистическая регрессия не сможет узнать полезную границу решения, потому что она не сможет «нарушить симметрию», Верно/Ложно?

True

False
правильный
Yes, Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.

8

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

Это приведет к тому, что входы tanh также будут очень большими, что приведет к тому, что градиенты также станут большими.Поэтому вам нужно установить α очень маленьким, чтобы предотвратить расхождение; это замедлит обучение.

Это приведет к тому, что входные данные tanh также будут очень большими, что приведет к «высокой активности» единиц и, таким образом, к ускорению обучения по сравнению с тем, если бы веса должны были начинаться с малых значений.

Это не имеет значения Пока вы инициализируете веса, случайный градиентный спуск не зависит от того, большие или маленькие веса.

This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
правильный
Yes. tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.

9

Consider the following 1 hidden layer neural network:

Which of the following statements are True? (Check all that apply).

10

In the same network as the previous question, what are the dimensions of Z[1] and A[1]?

Z[1] and A[1] are (1,4)

Z[1] and A[1] are (4,2)
правильный

Z[1] and A[1] are (4,1)

Z[1] and A[1] are (4,m)