张量流中的多层感知器不像预期的那样表现

我有一个简单的结构，我从一张Siraj Raval的视频中学习了张量流中的单层感知器。我试图把它扩展到更多的层次，我很困难。张量流中的多层感知器不像预期的那样表现

第一个示例是2个输入和2个输出，其中重量和偏差应用一次，然后将softmax函数应用于输出。

第二个示例是2个输入和2个输出，其间有一个隐藏层（2个单位），所以有两组权重和偏差，并且在每个输入之后应用softmax函数。

我试图将简单情况扩展为N隐藏图层的情况，但由于当我添加额外的图层时它们的成功有限，它们似乎被优化程序忽略。

输入的形式为：

inputX = np.array([[ 2.10400000e+03, 3.00000000e+00], 
        [ 1.60000000e+03, 3.00000000e+00], 
        [ 2.40000000e+03, 3.00000000e+00], 
        [ 1.41600000e+03, 2.00000000e+00], 
        [ 3.00000000e+03, 4.00000000e+00], 
        [ 1.98500000e+03, 4.00000000e+00], 
        [ 1.53400000e+03, 3.00000000e+00], 
        [ 1.42700000e+03, 3.00000000e+00], 
        [ 1.38000000e+03, 3.00000000e+00], 
        [ 1.49400000e+03, 3.00000000e+00]])

和输出标记的形式为：

inputY = np.array([[1, 0], 
        [1, 0], 
        [1, 0], 
        [0, 1], 
        [0, 1], 
        [1, 0], 
        [0, 1], 
        [1, 0], 
        [1, 0], 
        [1, 0]])

的我的一个代码段，其正确执行（依赖关系是numpy的和tensorflow）：

#input and output placeholder, feed data to x, feed labels to y_ 
x = tf.placeholder(tf.float32, [None, 2]) 
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases 
W = tf.Variable(tf.zeros([2,2])) 
b = tf.Variable(tf.zeros([2])) 

# vector form of x*W + b 
y_values = tf.add(tf.matmul(x, W), b) 

#activation function 
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors 
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost) 

init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) 

for i in range(training_epochs): 
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY}) 

    #log training 
    if i % display_step == 0: 
     cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY}) 

     print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc)) 

print("Optimization Finished!") 
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY}) 
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nb=", sess.run(b)) 


#check what it thinks when you give it the input data 
print(sess.run(y, feed_dict = {x:inputX}))

我得到的输出：

W= [[ 0.00021142 -0.00021142] 
    [ 0.00120122 -0.00120122]] 

b= [ 0.00103542 -0.00103542] 

label_predictions = [[ 0.71073025 0.28926972] 
        [ 0.66503692 0.33496314] 
        [ 0.73576927 0.2642307 ] 
        [ 0.64694035 0.35305965] 
        [ 0.78248388 0.21751612] 
        [ 0.70078063 0.2992194 ] 
        [ 0.65879178 0.34120819] 
        [ 0.6485498 0.3514502 ] 
        [ 0.64400673 0.3559933 ] 
        [ 0.65497971 0.34502029]]

不是很好，所以我想尝试增加图层的数量，看它是否会改善事情。

我用W2的新变量增加了一个额外层，B2和hidden_layer：

#input and output placeholder, feed data to x, feed labels to y_ 
x = tf.placeholder(tf.float32, [None, 2]) 
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases 
W = tf.Variable(tf.zeros([2,2])) 
b = tf.Variable(tf.zeros([2])) 

#second layer weights and biases 
W2 = tf.Variable(tf.zeros([2,2])) 
b2 = tf.Variable(tf.zeros([2])) 

#flow through first layer 
hidden_layer = tf.add(tf.matmul(x, W), b) 
hidden_layer = tf.nn.softmax(hidden_layer) 

#flow through second layer 
y_values = tf.add(tf.matmul(hidden_layer, W2), b2) 
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors 
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost) 

init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) 

for i in range(training_epochs): 
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY}) 

    #log training 
    if i % display_step == 0: 
     cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY}) 

     print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc)) 

print("Optimization Finished!") 
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY}) 
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\ 
      "\nb=", sess.run(b), "\nb2=", sess.run(b2)) 


#check what it thinks when you give it the input data 
print(sess.run(y, feed_dict = {x:inputX}))

我再告诉我的第一层重量和偏见都为零，而且这些预测现在约莫每个训练实例都有一半和一半，比以前差得多。

输出：

W= [[ 0. 0.] 
    [ 0. 0.]] 

W2= [[ 0.00199614 -0.00199614] 
    [ 0.00199614 -0.00199614]] 

b= [ 0. 0.] 
b2= [ 0.00199614 -0.00199614] 

label_predictions = [[ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384] 
        [ 0.5019961 0.49800384]]

为什么只有一层重量和偏见受到影响？为什么不添加改善模型的图层？

来源

2017-05-07 Michael Hackman

我为了改善模型性能的几点建议：

1）随机初始化的变量经常加班1比0更好，至少在矩阵元素。你可以尝试正态分布的变量。

2.）您应该规范化您的输入数据，因为两列的数量级不同。原则上，这不应该是一个问题，因为权重可以进行不同的调整，但随机初始化可能会使网络只关注第一列。如果您对数据进行归一化处理，则两列的数量级都是相同的。

3）也许你应该增加神经元的数目在隐藏层的约10

有了这些修改的值，它的工作很适合我。我下面贴一个完整的工作例如：

import tensorflow as tf 
import numpy as np 
alpha = 0.02 
training_epochs = 20000 
display_step = 2000 
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00], 
        [ 1.60000000e+03, 3.00000000e+00], 
        [ 2.40000000e+03, 3.00000000e+00], 
        [ 1.41600000e+03, 2.00000000e+00], 
        [ 3.00000000e+03, 4.00000000e+00], 
        [ 1.98500000e+03, 4.00000000e+00], 
        [ 1.53400000e+03, 3.00000000e+00], 
        [ 1.42700000e+03, 3.00000000e+00], 
        [ 1.38000000e+03, 3.00000000e+00], 
        [ 1.49400000e+03, 3.00000000e+00]]) 
n_samples = inputX.shape[0] 

# Normalize input data 
means = np.mean(inputX, axis=0) 
stddevs = np.std(inputX, axis=0) 
inputX[:,0] = (inputX[:,0] - means[0])/stddevs[0] 
inputX[:,1] = (inputX[:,1] - means[1])/stddevs[1] 

# Define target labels 
inputY = np.array([[1, 0], 
        [1, 0], 
        [1, 0], 
        [0, 1], 
        [0, 1], 
        [1, 0], 
        [0, 1], 
        [1, 0], 
        [1, 0], 
        [1, 0]]) 

#input and output placeholder, feed data to x, feed labels to y_ 
x = tf.placeholder(tf.float32, [None, 2]) 
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases 
W = tf.Variable(tf.random_normal([2,10], stddev=1.0/tf.sqrt(2.0))) 
b = tf.Variable(tf.zeros([10])) 

#second layer weights and biases 
W2 = tf.Variable(tf.random_normal([10,2], stddev=1.0/tf.sqrt(2.0))) 
b2 = tf.Variable(tf.zeros([2])) 

#flow through first layer 
hidden_layer = tf.add(tf.matmul(x, W), b) 
hidden_layer = tf.nn.softmax(hidden_layer) 

#flow through second layer 
y_values = tf.add(tf.matmul(hidden_layer, W2), b2) 
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors 
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost) 

init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) 

for i in range(training_epochs): 
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY}) 

    #log training 
    if i % display_step == 0: 
     cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY}) 
     #check what it thinks when you give it the input data 
     print(sess.run(y, feed_dict = {x:inputX})) 

     print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc)) 

print("Optimization Finished!") 
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY}) 
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\ 
      "\nb=", sess.run(b), "\nb2=", sess.run(b2))

输出看起来很像标签：

[[ 1.00000000e+00 2.48446125e-10] 
[ 9.99883890e-01 1.16143732e-04] 
[ 1.00000000e+00 2.48440435e-10] 
[ 1.65703295e-05 9.99983430e-01] 
[ 6.65045518e-05 9.99933481e-01] 
[ 9.99985337e-01 1.46147468e-05] 
[ 1.69444829e-04 9.99830484e-01] 
[ 1.00000000e+00 6.85981003e-12] 
[ 1.00000000e+00 2.05180339e-12] 
[ 9.99865890e-01 1.34040893e-04]]

来源

2017-05-07 13:07:16 ml4294

张量流中的多层感知器不像预期的那样表现

回答

相关问题