Начало работы с машинным обучением: разработайте свою первую нейронную сеть с помощью Python

Вот что может вас удивить:нейронные сети не так уж сложны!Термин «нейронная сеть» часто используется как модное слово, но на самом деле они часто намного проще, чем люди себе представляют.

This post is intended for complete beginners and assumes ZERO prior knowledge of machine learningМы поймем, как работают нейронные сети, реализуя их с нуля на Python.

Давайте начнем!

1. Building Blocks: Neurons

First, we have to talk about neurons, the basic unit of a neural network. A neuron takes inputs, does some math with them, and produces one outputВот как выглядит нейрон с двумя входами:

.inline-square { margin-left: 5px; width: 12px; height: 12px; display: inline-block; }

3 things are happening here. First, each input is multiplied by a weight:

x1→x1∗w1x_1 \стрелка вправо x_1 * w_1 x2→x2∗w2x_2 \стрелка вправо x_2 * w_2

Next, all the weighted inputs are added together with a bias bb:

(x1∗w1)+(x2∗w2)+b(x_1 * w_1) + (x_2 * w_2) + b

Finally, the sum is passed through an activation function:

y=f(x1∗w1+x2∗w2+b)y = f(x_1 * w_1 + x_2 * w_2 + b)

The activation function is used to turn an unbounded input into an output that has a nice, predictable form. A commonly used activation function is the sigmoid function:

The sigmoid function only outputs numbers in the range (0,1)(0, 1). You can think of it as compressing (−∞,+∞)(-\infty, +\infty) to (0,1)(0, 1) - big negative numbers become ~00, and big positive numbers become ~11.

A Simple Example

Assume we have a 2-input neuron that uses the sigmoid activation function and has the following parameters:

w=[0,1]w = [0, 1] b=4b = 4

w=[0,1]w = [0, 1] is just a way of writing w1=0,w2=1w_1 = 0, w_2 = 1в векторной форме Теперь давайте подадим на вход нейронуx=[2,3]x = [2, 3], Мы будем использоватьdot product to write things more concisely:

(w⋅x)+b=((w1∗x1)+(w2∗x2))+b=0∗2+1∗3+4=7\begin{выровнено} (w \cdot x) + b &= ((w_1 * x_1) + (w_2 * x_2)) + b \\ &= 0 * 2 + 1 * 3 + 4 \\ &= 7 \\ \end{выровнено} y=f(w⋅x+b)=f(7)=0,999y = f(w \cdot x + b) = f(7) = \boxed{0,999}

The neuron outputs 0.9990.999 given the inputs x=[2,3]x = [2, 3], Вот и все!Этот процесс передачи входных данных для получения выходных данных известен какfeedforward.

Coding a Neuron

Время реализовать нейрон! Мы будем использоватьNumPy, a popular and powerful computing library for Python, to help us do math:

import numpy as np

def sigmoid(x):
  # Our activation function: f(x) = 1 / (1 + e^(-x))
  return 1 / (1 + np.exp(-x))

class Neuron:
  def __init__(self, weights, bias):
    self.weights = weights
    self.bias = bias

  def feedforward(self, inputs):
    # Weight inputs, add bias, then use the activation function
    total = np.dot(self.weights, inputs) + self.bias
    return sigmoid(total)

weights = np.array([0, 1]) # w1 = 0, w2 = 1
bias = 4                   # b = 4
n = Neuron(weights, bias)

x = np.array([2, 3])       # x1 = 2, x2 = 3
print(n.feedforward(x))    # 0.9990889488055994

Узнаете эти числа? Это пример, который мы только что сделали! Мы получаем тот же ответ0.9990.999.

2. Combining Neurons into a Neural Network

Нейронная сеть — это не что иное, как набор нейронов, соединенных вместе Вот как может выглядеть простая нейронная сеть:

This network has 2 inputs, a hidden layer with 2 neurons (h1h_1 and h2h_2), and an output layer with 1 neuron (o1o_1). Notice that the inputs for o1o_1 are the outputs from h1h_1 and h2h_2- вот что делает это сетью.

A hidden layer is any layer between the input (first) layer and output (last) layer. There can be multiple hidden layers!

An Example: Feedforward

Давайте используем сеть, изображенную выше, и предположим, что все нейроны имеют одинаковые веса.w=[0,1]w = [0, 1], the same bias b=0b = 0, and the same sigmoid activation function. Let h1,h2,o1h_1, h_2, o_1 denote the outputs of the neurons they represent.

What happens if we pass in the input x=[2,3]x = [2, 3]?

h1=h2=f(w⋅x+b)=f((0∗2)+(1∗3)+0)=f(3)=0,9526\begin{выровнено} h_1 = h_2 &= f(w \cdot x + b) \\ &= f((0 * 2) + (1 * 3) + 0) \\ &= f(3) \\ &= 0,9526 \\ \end{выровнено} o1=f(w⋅[h1,h2]+b)=f((0∗h1)+(1∗h2)+0)=f(0,9526)=0,7216\begin{выровнено} o_1 &= f(w \cdot [h_1, h_2] + b) \\ &= f((0 * h_1) + (1 * h_2) + 0) \\ &= f(0,9526) \\ &= \в коробках{0,7216} \\ \end{выровнено}

The output of the neural network for input x=[2,3]x = [2, 3] is 0.72160.7216. Pretty simple, right?

A neural network can have any number of layers with any number of neuronsв этих слоях. Основная идея остается прежней: подавайте входные данные вперед через нейроны в сети, чтобы получить выходные данные в конце. Для простоты мы будем продолжать использовать сеть, изображенную выше, для остальных этого поста.

Coding a Neural Network: Feedforward

Давайте реализуем прямую связь для нашей нейронной сети.Вот снова изображение сети для справки:

import numpy as np

# ... code from previous section here

class OurNeuralNetwork:
  '''
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)
  Each neuron has the same weights and bias:
    - w = [0, 1]
    - b = 0
  '''
  def __init__(self):
    weights = np.array([0, 1])
    bias = 0

    # The Neuron class here is from the previous section
    self.h1 = Neuron(weights, bias)
    self.h2 = Neuron(weights, bias)
    self.o1 = Neuron(weights, bias)

  def feedforward(self, x):
    out_h1 = self.h1.feedforward(x)
    out_h2 = self.h2.feedforward(x)

    # The inputs for o1 are the outputs from h1 and h2
    out_o1 = self.o1.feedforward(np.array([out_h1, out_h2]))

    return out_o1

network = OurNeuralNetwork()
x = np.array([2, 3])
print(network.feedforward(x)) # 0.7216325609518421

We got 0.72160.7216 again! Looks like it works.

3. Training a Neural Network, Part 1

Say we have the following measurements:

Name	Weight (lb)	Height (in)	Gender
Alice	133	65	F
Bob	160	72	M
Charlie	152	70	M
Diana	120	60	F

Давайте обучим нашу сеть предсказывать чей-то пол по его весу и росту:

Мы будем представлять Мале с00 and Female with a 11, а также сдвинем данные, чтобы их было проще использовать:

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1
Bob	25	6	0
Charlie	17	4	0
Diana	-15	-6	1

I arbitrarily chose the shift amounts (135135 and 6666), чтобы числа выглядели красиво.Обычно вы бы сдвигались на среднее значение.

Loss

Прежде чем мы будем обучать нашу сеть, нам сначала нужен способ количественной оценки того, насколько «хорошо» она работает, чтобы она могла попытаться сделать «лучше».loss is.

Мы будем использоватьmean squared error (MSE) loss:

MSE=1n∑i=1n(ytrue-ypred)2\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_{true} - y_{pred})^2

Давайте разберем это:

nn is the number of samples, which is 44 (Alice, Bob, Charlie, Diana).
yy represents the variable being predicted, which is Gender.
ytruey_{true} is the trueзначение переменной («правильный ответ»). Например,ytruey_{true} for Alice would be 11 (Female).
ypredy_{pred} is the predictedзначение переменной Это то, что выводит наша сеть.

(ytrue-ypred)2(y_{true} - y_{pred})^2 is known as the squared error. Our loss function is simply taking the average over all squared errors (hence the name mean squared error). The better our predictions are, the lower our loss will be!

Better predictions = Lower loss.

Training a network = trying to minimize its loss.

An Example Loss Calculation

Допустим, наша сеть всегда выводит00- другими словами, он уверен, что все люди - мужчины ? В чем будет наша потеря?

Name	ytruey_{true}	(ytrue-ypred)2(y_{true} - y_{pred})^2
Alice	1	1
Bob	0	0
Charlie	0	0
Diana	1	1

MSE=14(1+0+0+1)=0.5\text{MSE} = \frac{1}{4} (1 + 0 + 0 + 1) = \boxed{0.5}

Code: MSE Loss

Вот некоторый код для расчета потерь для нас:

import numpy as np

def mse_loss(y_true, y_pred):
  # y_true and y_pred are numpy arrays of the same length.
  return ((y_true - y_pred) ** 2).mean()

y_true = np.array([1, 0, 0, 1])
y_pred = np.array([0, 0, 0, 0])

print(mse_loss(y_true, y_pred)) # 0.5

If you don't understand why this code works, read the NumPy quickstart on array operations.

Nice. Onwards!

4. Training a Neural Network, Part 2

We now have a clear goal: minimize the lossМы знаем, что можем изменить веса и смещения сети, чтобы повлиять на ее прогнозы, но как нам это сделать, чтобы уменьшить потери?

В этом разделе используется немного многомерного исчисления.Если вы не знакомы с исчислением, не стесняйтесь пропустить математические части.

Для простоты предположим, что в нашем наборе данных есть только Алиса:

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1

Тогда потеря среднеквадратичной ошибки — это просто квадрат ошибки Алисы:

MSE=11∑i=11(ytrue-ypred)2=(ytrue-ypred)2=(1-ypred)2\begin{align} \text{MSE} &= \frac{1}{1} \sum_{i=1}^1 (y_{true} - y_{pred})^2 \\ &= (y_{истина} - y_{пред})^2 \\ &= (1 - y_{пред})^2 \\ \end{выровнено}

Другой способ думать о потерях - как функция весов и смещений.Давайте обозначим каждый вес и смещение в нашей сети:

Then, we can write loss as a multivariable function:

L(w1,w2,w3,w4,w5,w6,b1,b2,b3)L(w_1, w_2, w_3, w_4, w_5, w_6, b_1, b_2, b_3)

Imagine we wanted to tweak w1w_1. How would loss LL change if we changed w1w_1Это вопросpartial derivative ∂L∂w1 \ гидроразрыва {\ парциальное L} {\ парциальное w_1} can answer. How do we calculate it?

Вот где математика начинает становиться более сложной.Не отчаивайтесь!Я рекомендую взять ручку и бумагу — это поможет вам понять.

Для начала перепишем частную производную через∂ypred∂w1\frac{\partial y_{pred}}{\partial w_1} instead:

∂L∂w1 = ∂L∂ypred∗∂ypred∂w1\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial y_{pred}} * \frac{\partial y_{ пред}}{\частичный w_1}

This works because of the Chain Rule.

We can calculate ∂L∂yпред \ гидроразрыва {\ парциальное L} {\ парциальное y_ {пред}} because we computed L=(1−yпред)2L = (1 - y_{пред})^2 above:

∂L∂ypred=∂(1−ypred)2∂ypred=−2(1−ypred)\frac{\partial L}{\partial y_{pred}} = \frac{\partial (1 - y_{pred}) )^2}{\partial y_{pred}} = \boxed{-2(1 - y_{pred})}

Теперь давайте разберемся, что делать с∂ypred∂w1\frac{\partial y_{pred}}{\partial w_1}. Just like before, let h1,h2,o1h_1, h_2, o_1 be the outputs of the neurons they represent. Then

ypred=o1=f(w5h1+w6h2+b3)y_{pred} = o_1 = f(w_5h_1 + w_6h_2 + b_3)

f is the sigmoid activation function, remember?

Since w1w_1 only affects h1h_1 (not h2h_2), we can write

∂ypred∂w1 = ∂ypred∂h1∗∂h1∂w1\frac{\partial y_{pred}}{\partial w_1} = \frac{\partial y_{pred}}{\partial h_1} * \frac{\ частичный h_1}{\частичный w_1} ∂ypred∂h1=w5∗f′(w5h1+w6h2+b3)\frac{\partial y_{pred}}{\partial h_1} = \boxed{w_5 * f'(w_5h_1 + w_6h_2 + b_3)}

More Chain Rule.

We do the same thing for ∂h1∂w1 \ гидроразрыва {\ парциальное h_1} {\ парциальное w_1}:

h1=f(w1x1+w2x2+b1)h_1 = f(w_1x_1 + w_2x_2 + b_1) ∂h1∂w1=x1∗f′(w1x1+w2x2+b1)\frac{\partial h_1}{\partial w_1} = \boxed{x_1 * f'(w_1x_1 + w_2x_2 + b_1)}

You guessed it, Chain Rule.

x1x_1 here is weight, and x2x_2это высота. Это второй раз, когда мы видимf'(х)f'(х)(производная сигмовидной функции) теперь! Давайте выведем это:

f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}} f′(x)=ex(1+e−x)2=f(x)∗(1−f(x))f'(x) = \frac{e^x}{(1 + e^{- х})^2} = е (х) * (1 - е (х))

Мы будем использовать эту красивую форму дляf'(х)f'(х) later.

Готово! Нам удалось сломаться.∂L∂w1 \ гидроразрыва {\ парциальное L} {\ парциальное w_1} into several parts we can calculate:

∂L∂w1 = ∂L∂ypred∗∂ypred∂h1∗∂h1∂w1 \ в штучной упаковке {\ гидроразрыва {\ парциальное L} {\ парциальное w_1} = \ гидроразрыва {\ парциальное L} {\ парциальное y_ {пред}} * \frac{\partial y_{pred}}{\partial h_1} * \frac{\partial h_1}{\partial w_1}}

This system of calculating partial derivatives by working backwards is known as backpropagation, или "бэкпроп".

Уф. Там было много символов — ничего страшного, если вы все еще немного запутались. Давайте сделаем пример, чтобы увидеть это в действии!

Example: Calculating the Partial Derivative

Мы собираемся продолжать делать вид, что в нашем наборе данных есть только Алиса:

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1

Давайте инициализируем все веса11 and all the biases to 00. If we do a feedforward pass through the network, we get:

h1 = f (w1x1 + w2x2 + b1) = f (−2 + −1 + 0) = 0,0474 \ начало {выровнено} h_1 &= f(w_1x_1 + w_2x_2 + b_1) \\ &= f(-2 + -1 + 0) \\ &= 0,0474 \\ \end{выровнено} h2=f(w3x1+w4x2+b2)=0.0474h_2 = f(w_3x_1 + w_4x_2 + b_2) = 0.0474 o1=f(w5h1+w6h2+b3)=f(0.0474+0.0474+0)=0.524\begin{aligned} o_1 &= f(w_5h_1 + w_6h_2 + b_3) \\ &= f(0.0474 + 0.0474 + 0) \\ &= 0.524 \\ \end{aligned}

The network outputs ypred=0.524y_{pred} = 0.524, что не сильно благоприятствует мужчинам (00) or Female (11).Рассчитаем∂L∂w1 \ гидроразрыва {\ парциальное L} {\ парциальное w_1}:

∂L∂w1=∂L∂ypred∗∂ypred∂h1∗∂h1∂w1\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial y_{pred}} * \frac {\ парциальное у_ {пред}} {\ парциальное ч_1} * \ гидроразрыва {\ парциальное ч_1} {\ парциальное ш_1} ∂L∂ypred=−2(1−ypred)=−2(1−0,524)=−0,952\begin{выровнено} \frac{\partial L}{\partial y_{pred}} &= -2(1 - y_{pred}) \\ &= -2(1 - 0,524) \\ &=-0,952\\ \end{выровнено} ∂ypred∂h1=w5∗f′(w5h1+w6h2+b3)=1∗f′(0,0474+0,0474+0)=f(0,0948)∗(1−f(0,0948))=0,249\begin{выровнено} \frac{\partial y_{pred}}{\partial h_1} &= w_5 * f'(w_5h_1 + w_6h_2 + b_3) \\ &= 1 * f'(0,0474 + 0,0474 + 0) \\ &= f(0,0948) * (1 - f(0,0948)) \\ &= 0,249 \\ \end{выровнено} ∂h1∂w1=x1∗f′(w1x1+w2x2+b1)=−2∗f′(−2+−1+0)=−2∗f(−3)∗(1−f(−3)) = −0,0904 \ начало {выровнено} \frac{\partial h_1}{\partial w_1} &= x_1 * f'(w_1x_1 + w_2x_2 + b_1) \\ &= -2 * f'(-2 + -1 + 0) \\ &= -2 * f(-3) * (1 - f(-3)) \\ &=-0,0904\\ \end{выровнено} ∂L∂w1 = −0,952∗0,249∗−0,0904=0,0214\begin{выровнено} \frac{\partial L}{\partial w_1} &= -0,952 * 0,249 * -0,0904 \\ &= \в коробках{0,0214} \\ \end{выровнено}

Reminder: we derived f'(x)=f(x)∗(1−f(x))f'(x) = f(x) * (1 - f(x)) for our sigmoid activation function earlier.

We did it! This tells us that if we were to increase w1w_1, LL would increase a tiiiny bit as a result.

Training: Stochastic Gradient Descent

We have all the tools we need to train a neural network now!Мы будем использовать алгоритм оптимизации под названиемstochastic gradient descent(SGD), который говорит нам, как изменить наши веса и смещения, чтобы минимизировать потери, В основном это просто уравнение обновления:

w1 ←w1−η∂L∂w1w_1 \leftarrow w_1 - \eta \frac{\partial L}{\partial w_1}

η\eta is a constant called the learning rateкоторый контролирует, насколько быстро мы тренируемся. Все, что мы делаем, это вычитаемη∂L∂w1 \ эта \ гидроразрыва {\ парциальное L} {\ парциальное w_1} from w1w_1:

If ∂L∂w1 \ гидроразрыва {\ парциальное L} {\ парциальное w_1} is positive, w1w_1 will decrease, which makes LL decrease.
If ∂L∂w1 \ гидроразрыва {\ парциальное L} {\ парциальное w_1} is negative, w1w_1 will increase, which makes LL decrease.

If we do this for every weight and bias in the network, the loss will slowly decrease and our network will improve.

Our training process will look like this:

Choose one sample from our dataset. This is what makes it stochastic gradient descent - we only operate on one sample at a time.
Calculate all the partial derivatives of loss with respect to weights or biases (e.g. ∂L∂w1 \ гидроразрыва {\ парциальное L} {\ парциальное w_1}, ∂L∂w2 \ гидроразрыва {\ парциальное L} {\ парциальное w_2}, etc).
Use the update equation to update each weight and bias.
Go back to step 1.

Давайте посмотрим на это в действии!

Code: A Complete Neural Network

Этоfinally time to implement a complete neural network:

Name	Weight (minus 135)	Height (minus 66)	Gender
Alice	-2	-1	1
Bob	25	6	0
Charlie	17	4	0
Diana	-15	-6	1

import numpy as np

def sigmoid(x):
  # Sigmoid activation function: f(x) = 1 / (1 + e^(-x))
  return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):
  # Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))
  fx = sigmoid(x)
  return fx * (1 - fx)

def mse_loss(y_true, y_pred):
  # y_true and y_pred are numpy arrays of the same length.
  return ((y_true - y_pred) ** 2).mean()

class OurNeuralNetwork:
  '''
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)

  *** DISCLAIMER ***:
  The code below is intended to be simple and educational, NOT optimal.
  Real neural net code looks nothing like this. DO NOT use this code.
  Instead, read/run it to understand how this specific network works.
  '''
  def __init__(self):
    # Weights
    self.w1 = np.random.normal()
    self.w2 = np.random.normal()
    self.w3 = np.random.normal()
    self.w4 = np.random.normal()
    self.w5 = np.random.normal()
    self.w6 = np.random.normal()

    # Biases
    self.b1 = np.random.normal()
    self.b2 = np.random.normal()
    self.b3 = np.random.normal()

  def feedforward(self, x):
    # x is a numpy array with 2 elements.
    h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
    h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
    o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
    return o1

  def train(self, data, all_y_trues):
    '''
    - data is a (n x 2) numpy array, n = # of samples in the dataset.
    - all_y_trues is a numpy array with n elements.
      Elements in all_y_trues correspond to those in data.
    '''
    learn_rate = 0.1
    epochs = 1000 # number of times to loop through the entire dataset

    for epoch in range(epochs):
      for x, y_true in zip(data, all_y_trues):
        # --- Do a feedforward (we'll need these values later)
        sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
        h1 = sigmoid(sum_h1)

        sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
        h2 = sigmoid(sum_h2)

        sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
        o1 = sigmoid(sum_o1)
        y_pred = o1

        # --- Calculate partial derivatives.
        # --- Naming: d_L_d_w1 represents "partial L / partial w1"
        d_L_d_ypred = -2 * (y_true - y_pred)

        # Neuron o1
        d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
        d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
        d_ypred_d_b3 = deriv_sigmoid(sum_o1)

        d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
        d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)

        # Neuron h1
        d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
        d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
        d_h1_d_b1 = deriv_sigmoid(sum_h1)

        # Neuron h2
        d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
        d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
        d_h2_d_b2 = deriv_sigmoid(sum_h2)

        # --- Update weights and biases
        # Neuron h1
        self.w1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1
        self.w2 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2
        self.b1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1

        # Neuron h2
        self.w3 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3
        self.w4 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4
        self.b2 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2

        # Neuron o1
        self.w5 -= learn_rate * d_L_d_ypred * d_ypred_d_w5
        self.w6 -= learn_rate * d_L_d_ypred * d_ypred_d_w6
        self.b3 -= learn_rate * d_L_d_ypred * d_ypred_d_b3

      # --- Calculate total loss at the end of each epoch
      if epoch % 10 == 0:
        y_preds = np.apply_along_axis(self.feedforward, 1, data)
        loss = mse_loss(all_y_trues, y_preds)
        print("Epoch %d loss: %.3f" % (epoch, loss))

# Define dataset
data = np.array([
  [-2, -1],  # Alice
  [25, 6],   # Bob
  [17, 4],   # Charlie
  [-15, -6], # Diana
])
all_y_trues = np.array([
  1, # Alice
  0, # Bob
  0, # Charlie
  1, # Diana
])

# Train our neural network!
network = OurNeuralNetwork()
network.train(data, all_y_trues)

You can run / play with this code yourself.Он также доступен наGithub.

Our loss steadily decreases as the network learns:

We can now use the network to predict genders:

# Make some predictions
emily = np.array([-7, -3]) # 128 pounds, 63 inches
frank = np.array([20, 2])  # 155 pounds, 68 inches
print("Emily: %.3f" % network.feedforward(emily)) # 0.951 - F
print("Frank: %.3f" % network.feedforward(frank)) # 0.039 - M

Now What?

You made it! A quick recap of what we did:

Introduced neurons, the building blocks of neural networks.
Used the sigmoid activation function in our neurons.
Saw that neural networks are just neurons connected together.
Created a dataset with Weight and Height as inputs (or features) and Gender as the output (or label).
Learned about loss functions and the mean squared error (MSE) loss.
Realized that training a network is just minimizing its loss.
Used backpropagation to calculate partial derivatives.
Used stochastic gradient descent (SGD) to train our network.

Еще многое предстоит сделать:

Experiment with bigger / better neural networks using proper machine learning libraries like Tensorflow, Keras, and PyTorch.
Tinker with a neural network in your browser.
Discover other activation functions besides sigmoid.
Discover other optimizers besides SGD.
Learn about Convolutional Neural Networks, which revolutionized the field of Computer Vision.
Learn about Recurrent Neural Networks, often used for Natural Language Processing (NLP).

I may write about these topics or similar ones in the future, so subscribe if you want to get notified about new posts.

Thanks for reading!