我有一个简单的结构,我从一张Siraj Raval的视频中学习了张量流中的单层感知器。我试图把它扩展到更多的层次,我很困难。张量流中的多层感知器不像预期的那样表现
第一个示例是2个输入和2个输出,其中重量和偏差应用一次,然后将softmax函数应用于输出。
第二个示例是2个输入和2个输出,其间有一个隐藏层(2个单位),所以有两组权重和偏差,并且在每个输入之后应用softmax函数。
我试图将简单情况扩展为N隐藏图层的情况,但由于当我添加额外的图层时它们的成功有限,它们似乎被优化程序忽略。
输入的形式为:
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
和输出标记的形式为:
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
的我的一个代码段,其正确执行(依赖关系是numpy的和tensorflow):
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
# vector form of x*W + b
y_values = tf.add(tf.matmul(x, W), b)
#activation function
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nb=", sess.run(b))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
我得到的输出:
W= [[ 0.00021142 -0.00021142]
[ 0.00120122 -0.00120122]]
b= [ 0.00103542 -0.00103542]
label_predictions = [[ 0.71073025 0.28926972]
[ 0.66503692 0.33496314]
[ 0.73576927 0.2642307 ]
[ 0.64694035 0.35305965]
[ 0.78248388 0.21751612]
[ 0.70078063 0.2992194 ]
[ 0.65879178 0.34120819]
[ 0.6485498 0.3514502 ]
[ 0.64400673 0.3559933 ]
[ 0.65497971 0.34502029]]
不是很好,所以我想尝试增加图层的数量,看它是否会改善事情。
我用W2的新变量增加了一个额外层,B2和hidden_layer:
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
#second layer weights and biases
W2 = tf.Variable(tf.zeros([2,2]))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\
"\nb=", sess.run(b), "\nb2=", sess.run(b2))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
我再告诉我的第一层重量和偏见都为零,而且这些预测现在约莫每个训练实例都有一半和一半,比以前差得多。
输出:
W= [[ 0. 0.]
[ 0. 0.]]
W2= [[ 0.00199614 -0.00199614]
[ 0.00199614 -0.00199614]]
b= [ 0. 0.]
b2= [ 0.00199614 -0.00199614]
label_predictions = [[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]]
为什么只有一层重量和偏见受到影响?为什么不添加改善模型的图层?