渐变下降不起作用

我正在学习斯坦福大学课程“TensorFlow深度学习研究”的张量流。我已从以下address获取了代码。同时探索tensorflow我改变渐变下降不起作用

Y_predicted = X *瓦特+ B

作为

Y_predicted = X * X * W + X * U + B

检查非线性曲线拟合得更好。我已经根据笔者的这个note(page 3)的建议添加

Y_predicted = X * X * W + X * U + B

。但是在添加此行并再次运行类似代码后，每个错误值似乎都会得到nan。有人可以指出问题并给出解决方案。

""" Simple linear regression example in TensorFlow 
This program tries to predict the number of thefts from 
the number of fire in the city of Chicago 
Author: Chip Huyen 
Prepared for the class CS 20SI: "TensorFlow for Deep Learning Research" 
cs20si.stanford.edu 
""" 
import os 
os.environ['TF_CPP_MIN_LOG_LEVEL']='2' 

import numpy as np 
import matplotlib.pyplot as plt 
import tensorflow as tf 
import xlrd 

#import utils 

DATA_FILE = "slr05.xls" 

# Step 1: read in data from the .xls file 
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8") 
sheet = book.sheet_by_index(0) 
data = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)]) 
n_samples = sheet.nrows - 1 

# Step 2: create placeholders for input X (number of fire) and label Y (number of theft) 
X = tf.placeholder(tf.float32, name='X') 
Y = tf.placeholder(tf.float32, name='Y') 

# Step 3: create weight and bias, initialized to 0 
w = tf.Variable(0.0, name='weights') 
u = tf.Variable(0.0, name='weights2') 
b = tf.Variable(0.0, name='bias') 

# Step 4: build model to predict Y 
#Y_predicted = X * w + b 
Y_predicted = X *  X *  w +  X *  u +  b 

# Step 5: use the square error as the loss function 
loss = tf.square(Y - Y_predicted, name='loss') 
# loss = utils.huber_loss(Y, Y_predicted) 

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss 
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss) 

with tf.Session() as sess: 
    # Step 7: initialize the necessary variables, in this case, w and b 
    sess.run(tf.global_variables_initializer()) 

    writer = tf.summary.FileWriter('./graphs/linear_reg', sess.graph) 

    # Step 8: train the model 
    for i in range(100): # train the model 100 epochs 
     total_loss = 0 
     for x, y in data: 
      # Session runs train_op and fetch values of loss 
      _, l = sess.run([optimizer, loss], feed_dict={X: x, Y:y}) 
      total_loss += l 
     print('Epoch {0}: {1}'.format(i, total_loss/n_samples)) 

    # close the writer when you're done using it 
    writer.close() 

    # Step 9: output the values of w and b 
    w, u , b = sess.run([w, u , b]) 

# plot the results 
X, Y = data.T[0], data.T[1] 
plt.plot(X, Y, 'bo', label='Real data') 
plt.plot(X, X * x * w + X * u + b, 'r', label='Predicted data') 
plt.legend() 
plt.show()

来源

2017-07-18 Maruf

哎呀！你的学习速度似乎太大了，试着用learning_rate=0.0000001之类的东西，它会收敛。这是一个常见问题，尤其是当你引入交互功能时，你应该记住x**2的范围会更大（如果原始值是[-100,100] 10000，10000]），因此对于线性模型来说效果良好的学习率对于多项式可能太大。查看约feature scaling。这幅画给人更直观的解释：

希望它能帮助！
Andres

来源

2017-07-18 22:13:10

现在我明白是什么问题了。谢谢你指出。设置梯度0.00000001会产生比先前的线性基础函数更好的错误739。绘图后，我得到了以下输出，http://imgur.com/7RwnfvD为什么有多条红线？这是发生Basis功能扩展（因为数据在更高的维度） – Maruf

嗨！这看起来像一个阴谋的问题......你应该看到一个单一的抛物线，一个对应于你优化的w，u和b参数，你可以在WolframAlpha中试用它：https：//www.wolframalpha.com/input/？i = 2x％5E2％2B3x％2B4 与TF玩得开心！ –

我是教这门课的人。就像@fr_andres说的那样，你的lr可能太大了。让我知道如果这不起作用。

来源

2017-07-18 22:17:04

很荣幸能在这里发表您的评论。是的，你是对的。 lr太大了。另一件事，如果你不感到不安，我可以使用剪裁来解决这类问题吗？ – Maruf

渐变下降不起作用

回答

相关问题