如何正确实现机器学习MNIST数据集的反向传播？

所以，我使用的是迈克尔·尼尔森的机器学习本书作为我的代码引用（这基本上是相同的）：http://neuralnetworksanddeeplearning.com/chap1.html 如何正确实现机器学习MNIST数据集的反向传播？

有问题的代码：

def backpropagate(self, image, image_value) : 


     # declare two new numpy arrays for the updated weights & biases 
     new_biases = [np.zeros(bias.shape) for bias in self.biases] 
     new_weights = [np.zeros(weight_matrix.shape) for weight_matrix in self.weights] 

     # -------- feed forward -------- 
     # store all the activations in a list 
     activations = [image] 

     # declare empty list that will contain all the z vectors 
     zs = [] 
     for bias, weight in zip(self.biases, self.weights) : 
      print(bias.shape) 
      print(weight.shape) 
      print(image.shape) 
      z = np.dot(weight, image) + bias 
      zs.append(z) 
      activation = sigmoid(z) 
      activations.append(activation) 

     # -------- backward pass -------- 
     # transpose() returns the numpy array with the rows as columns and columns as rows 
     delta = self.cost_derivative(activations[-1], image_value) * sigmoid_prime(zs[-1]) 
     new_biases[-1] = delta 
     new_weights[-1] = np.dot(delta, activations[-2].transpose()) 

     # l = 1 means the last layer of neurons, l = 2 is the second-last, etc. 
     # this takes advantage of Python's ability to use negative indices in lists 
     for l in range(2, self.num_layers) : 
      z = zs[-1] 
      sp = sigmoid_prime(z) 
      delta = np.dot(self.weights[-l+1].transpose(), delta) * sp 
      new_biases[-l] = delta 
      new_weights[-l] = np.dot(delta, activations[-l-1].transpose()) 
     return (new_biases, new_weights)

我的算法只能获取到在发生此错误之前的第一轮反向传播：

File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 97, in stochastic_gradient_descent 
    self.update_mini_batch(mini_batch, learning_rate) 
    File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 117, in update_mini_batch 
    delta_biases, delta_weights = self.backpropagate(image, image_value) 
    File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 160, in backpropagate 
    z = np.dot(weight, activation) + bias 
ValueError: shapes (30,50000) and (784,1) not aligned: 50000 (dim 1) != 784 (dim 0)

我得到它为什么是错误。权重中的列数与像素图像中的行数不匹配，所以我无法进行矩阵乘法。这是我感到困惑的地方 - 反向传播中使用了30个神经元，每个神经元都有50,000个图像被评估。我的理解是，每个50,000应该有784个权重，每个像素一个。但是，当我修改相应的代码：

 count = 0 
     for bias, weight in zip(self.biases, self.weights) : 
      print(bias.shape) 
      print(weight[count].shape) 
      print(image.shape) 
      z = np.dot(weight[count], image) + bias 
      zs.append(z) 
      activation = sigmoid(z) 
      activations.append(activation) 
      count += 1

我仍然得到类似的错误：

ValueError: shapes (50000,) and (784,1) not aligned: 50000 (dim 0) != 784 (dim 0)

我只是真的confuzzled所涉及的所有线性代数，我想我只是失去了一些东西关于权重矩阵的结构。任何帮助都将不胜感激。

来源

2017-10-20 Eli

看起来问题在于您对原始代码的更改。

I'be从您提供的链接下载的例子，它的作品没有任何错误：

下面是完整的源代码，我用：

import cPickle 
import gzip 
import numpy as np 
import random 

def load_data(): 
    """Return the MNIST data as a tuple containing the training data, 
    the validation data, and the test data. 
    The ``training_data`` is returned as a tuple with two entries. 
    The first entry contains the actual training images. This is a 
    numpy ndarray with 50,000 entries. Each entry is, in turn, a 
    numpy ndarray with 784 values, representing the 28 * 28 = 784 
    pixels in a single MNIST image. 
    The second entry in the ``training_data`` tuple is a numpy ndarray 
    containing 50,000 entries. Those entries are just the digit 
    values (0...9) for the corresponding images contained in the first 
    entry of the tuple. 
    The ``validation_data`` and ``test_data`` are similar, except 
    each contains only 10,000 images. 
    This is a nice data format, but for use in neural networks it's 
    helpful to modify the format of the ``training_data`` a little. 
    That's done in the wrapper function ``load_data_wrapper()``, see 
    below. 
    """ 
    f = gzip.open('../data/mnist.pkl.gz', 'rb') 
    training_data, validation_data, test_data = cPickle.load(f) 
    f.close() 
    return (training_data, validation_data, test_data) 

def load_data_wrapper(): 
    """Return a tuple containing ``(training_data, validation_data, 
    test_data)``. Based on ``load_data``, but the format is more 
    convenient for use in our implementation of neural networks. 
    In particular, ``training_data`` is a list containing 50,000 
    2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray 
    containing the input image. ``y`` is a 10-dimensional 
    numpy.ndarray representing the unit vector corresponding to the 
    correct digit for ``x``. 
    ``validation_data`` and ``test_data`` are lists containing 10,000 
    2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional 
    numpy.ndarry containing the input image, and ``y`` is the 
    corresponding classification, i.e., the digit values (integers) 
    corresponding to ``x``. 
    Obviously, this means we're using slightly different formats for 
    the training data and the validation/test data. These formats 
    turn out to be the most convenient for use in our neural network 
    code.""" 
    tr_d, va_d, te_d = load_data() 
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]] 
    training_results = [vectorized_result(y) for y in tr_d[1]] 
    training_data = zip(training_inputs, training_results) 
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]] 
    validation_data = zip(validation_inputs, va_d[1]) 
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]] 
    test_data = zip(test_inputs, te_d[1]) 
    return (training_data, validation_data, test_data) 

def vectorized_result(j): 
    """Return a 10-dimensional unit vector with a 1.0 in the jth 
    position and zeroes elsewhere. This is used to convert a digit 
    (0...9) into a corresponding desired output from the neural 
    network.""" 
    e = np.zeros((10, 1)) 
    e[j] = 1.0 
    return e 

class Network(object): 

    def __init__(self, sizes): 
     """The list ``sizes`` contains the number of neurons in the 
     respective layers of the network. For example, if the list 
     was [2, 3, 1] then it would be a three-layer network, with the 
     first layer containing 2 neurons, the second layer 3 neurons, 
     and the third layer 1 neuron. The biases and weights for the 
     network are initialized randomly, using a Gaussian 
     distribution with mean 0, and variance 1. Note that the first 
     layer is assumed to be an input layer, and by convention we 
     won't set any biases for those neurons, since biases are only 
     ever used in computing the outputs from later layers.""" 
     self.num_layers = len(sizes) 
     self.sizes = sizes 
     self.biases = [np.random.randn(y, 1) for y in sizes[1:]] 
     self.weights = [np.random.randn(y, x) 
         for x, y in zip(sizes[:-1], sizes[1:])] 

    def feedforward(self, a): 
     """Return the output of the network if ``a`` is input.""" 
     for b, w in zip(self.biases, self.weights): 
      a = sigmoid(np.dot(w, a)+b) 
     return a 

    def SGD(self, training_data, epochs, mini_batch_size, eta, 
      test_data=None): 
     """Train the neural network using mini-batch stochastic 
     gradient descent. The ``training_data`` is a list of tuples 
     ``(x, y)`` representing the training inputs and the desired 
     outputs. The other non-optional parameters are 
     self-explanatory. If ``test_data`` is provided then the 
     network will be evaluated against the test data after each 
     epoch, and partial progress printed out. This is useful for 
     tracking progress, but slows things down substantially.""" 
     if test_data: n_test = len(test_data) 
     n = len(training_data) 
     for j in xrange(epochs): 
      random.shuffle(training_data) 
      mini_batches = [ 
       training_data[k:k+mini_batch_size] 
       for k in xrange(0, n, mini_batch_size)] 
      for mini_batch in mini_batches: 
       self.update_mini_batch(mini_batch, eta) 
      if test_data: 
       print "Epoch {0}: {1}/{2}".format(
        j, self.evaluate(test_data), n_test) 
      else: 
       print "Epoch {0} complete".format(j) 

    def update_mini_batch(self, mini_batch, eta): 
     """Update the network's weights and biases by applying 
     gradient descent using backpropagation to a single mini batch. 
     The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta`` 
     is the learning rate.""" 
     nabla_b = [np.zeros(b.shape) for b in self.biases] 
     nabla_w = [np.zeros(w.shape) for w in self.weights] 
     for x, y in mini_batch: 
      delta_nabla_b, delta_nabla_w = self.backprop(x, y) 
      nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 
      nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 
     self.weights = [w-(eta/len(mini_batch))*nw 
         for w, nw in zip(self.weights, nabla_w)] 
     self.biases = [b-(eta/len(mini_batch))*nb 
         for b, nb in zip(self.biases, nabla_b)] 

    def backprop(self, x, y): 
     """Return a tuple ``(nabla_b, nabla_w)`` representing the 
     gradient for the cost function C_x. ``nabla_b`` and 
     ``nabla_w`` are layer-by-layer lists of numpy arrays, similar 
     to ``self.biases`` and ``self.weights``.""" 
     nabla_b = [np.zeros(b.shape) for b in self.biases] 
     nabla_w = [np.zeros(w.shape) for w in self.weights] 
     # feedforward 
     activation = x 
     activations = [x] # list to store all the activations, layer by layer 
     zs = [] # list to store all the z vectors, layer by layer 
     for b, w in zip(self.biases, self.weights): 
      z = np.dot(w, activation)+b 
      zs.append(z) 
      activation = sigmoid(z) 
      activations.append(activation) 
     # backward pass 
     delta = self.cost_derivative(activations[-1], y) * \ 
      sigmoid_prime(zs[-1]) 
     nabla_b[-1] = delta 
     nabla_w[-1] = np.dot(delta, activations[-2].transpose()) 
     # Note that the variable l in the loop below is used a little 
     # differently to the notation in Chapter 2 of the book. Here, 
     # l = 1 means the last layer of neurons, l = 2 is the 
     # second-last layer, and so on. It's a renumbering of the 
     # scheme in the book, used here to take advantage of the fact 
     # that Python can use negative indices in lists. 
     for l in xrange(2, self.num_layers): 
      z = zs[-l] 
      sp = sigmoid_prime(z) 
      delta = np.dot(self.weights[-l+1].transpose(), delta) * sp 
      nabla_b[-l] = delta 
      nabla_w[-l] = np.dot(delta, activations[-l-1].transpose()) 
     return (nabla_b, nabla_w) 

    def evaluate(self, test_data): 
     """Return the number of test inputs for which the neural 
     network outputs the correct result. Note that the neural 
     network's output is assumed to be the index of whichever 
     neuron in the final layer has the highest activation.""" 
     test_results = [(np.argmax(self.feedforward(x)), y) 
         for (x, y) in test_data] 
     return sum(int(x == y) for (x, y) in test_results) 

    def cost_derivative(self, output_activations, y): 
     """Return the vector of partial derivatives \partial C_x/
     \partial a for the output activations.""" 
     return (output_activations-y) 

#### Miscellaneous functions 
def sigmoid(z): 
    """The sigmoid function.""" 
    return 1.0/(1.0+np.exp(-z)) 

def sigmoid_prime(z): 
    """Derivative of the sigmoid function.""" 
    return sigmoid(z)*(1-sigmoid(z)) 

training_data, validation_data, test_data = load_data_wrapper() 
net = Network([784, 30, 10]) 
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

附加信息：

但是，我会建议使用现有的框架之一，例如 - 凯拉斯不要重新发明轮子

而且，它是与Python 3.6检查：

来源

2017-10-21 13:11:19

谢谢。此外，我试图让代码工作的唯一原因是因为我试图了解神经网络背后的实际机制，并了解它为什么以它的方式工作。我也必须能够向我参与的数据科学俱乐部解释。在我开始工作之后，我将继续讨论现有的框架，可能是tensorflow。 – Eli

好吧，我再次查看了代码，它几乎完全相同，旁边更改了变量名称和更新了Python 3的兼容性（您的代码是Python 2）。仍然出现此错误：\ n delta = np.dot（self.weights [-l + 1] .transpose（），delta）* sp ValueError：操作数无法与形状（30,1）一起广播（ 10,1）你愿意看看我的代码，看看我是否错过了一些不太明显的东西？回复：https：//github.com/elijahanderson/DPUDS_Projects/tree/master/Fall_2017/MNIST – Eli

@Eli：我检查了链接中的代码，它至少在我的环境中使用python 2.7正常工作。之后，我用Python 3.6检查了代码（请参阅截图添加到我的答案） - 工作得很好。我不确定是否确实导致了您在环境中提到的错误，可能是某些软件包的版本错误或配置错误。你可以尝试升级或重新安装你的跳跃包吗？如果它没有帮助，我会建议重新安装您的Python环境与所有包 –

荣誉深入探究尼尔森的代码。深入理解NN原理是一个很好的资源。太多人不知道引擎盖下发生了什么，他们向凯拉斯前进。

每个训练示例都没有得到自己的权重。每个功能都有。如果每个例子都有自己的权重，那么每个权重集合都会适合其相应的训练样例。另外，如果您稍后使用受过训练的网络在单个测试示例上运行推理，那么如果仅显示一个手写数字，它将如何处理50,000组重量？相反，隐藏层中的30个神经元中的每一个都会学习一组784个权重，每个像素一个权重，当对任何手写数字进行推广时，该权重可提供高预测精度。

导入network.py并实例化这样的网络类而不修改任何代码：

net = network.Network([784, 30, 10])

..它为您提供了一个包含784个输入神经元，30个隐藏神经元和10个输出神经元的网络。您的体重矩阵的尺寸分别为[30, 784]和[10, 30]。当您向网络输入维度为[784, 1]的输入数组时，给出错误的矩阵乘法是有效的，因为权重矩阵的dim 1等于输入数组（均为784）的dim 0。

你的问题不是backprop的实现，而是建立一个适合你输入数据形状的网络体系结构。如果记忆在第一章中将尼尔森作为黑匣子留在黑匣子中，并且直到第二章才会进入黑匣子。继续保持它，祝你好运！

来源

2017-10-22 18:04:19 jklaus

如何正确实现机器学习MNIST数据集的反向传播？

回答

相关问题