1
错误的答案我已经有了一套玩具数据的格式如下:多项式回归给使用Tensorflow
x - x**2 + x**3
我试图创建一个使用Tensorflow预测的权重,Python脚本在哪这种情况应该是[1,-1,1]。但是,当我运行它时,我提出了荒谬的答案。
这是我的代码:
# Optional; supresses warnings about GPU
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
# Read the data
COLUMNS = ["url", "title_length", "article_length", "keywords", "shares"]
data = np.genfromtxt("OnlineNewsPopularityNonLinear.csv", delimiter=',', names=COLUMNS)
# Determine how many data points we're using and the order of the equation
number_of_records = data.size
equation_order = 3
# Set up the variables for weights and bias, but as matrices
w = tf.Variable(np.zeros([equation_order, 1]), dtype="float32", name="w")
b = tf.Variable(np.zeros([1]), dtype="float32", name="b")
content_info_temp = np.zeros([number_of_records, equation_order])
content_info = tf.placeholder("float32", shape=[number_of_records, equation_order])
actual_shares = tf.placeholder("float32")
# Input data should be a matrix of [number_of_records, equation_order] where
# each value has been raised to the appropriate power, according to
# our model. We'll need to call it
for i in range(equation_order):
print (i)
content_info_temp[:, i] = np.power(data["article_length"], (i+1))/np.max(np.power(data["article_length"], (i+1)))
# Create the prediction; it's still y = mx + b, but in this case
# m is a matrix of weights, and x is a matrix of values
predicted_shares = tf.add(tf.matmul(content_info, w), b)
# Loss is the same as before.
error = tf.reduce_mean(tf.square(predicted_shares - actual_shares))
# Create the optimizer
step_size = .001
optimizer = tf.train.GradientDescentOptimizer(step_size).minimize(error)
# Create the model
model = tf.global_variables_initializer()
# Create the session to run the algorith
with tf.Session() as session:
# Initialize everything
session.run(model)
# Run the algorithm
for i in range(100000):
#Just as before, we run the algorithm, but we're feeding in normalized matrixes rather than single values
#session.run(optimizer, feed_dict={content_info: input_data, actual_shares: data['shares']/np.max(data["shares"])})
session.run(optimizer, feed_dict={content_info: content_info_temp, actual_shares: data['shares']/np.max(data['shares'])})
# Display every 100 results
if (i % 100 == 0):
print (session.run(w))
#print (session.run(predicted_shares - actual_shares))
#Display the final result
w_value = session.run(w)
print ("FINAL:")
print (w_value)
print (w_value[0]*np.max(data["article_length"]))
print (w_value[1]*np.max(session.run(tf.pow(data["article_length"], 2))))
print (w_value[2]*np.max(session.run(tf.pow(data["article_length"], 3))))
如果我跑,我得到:
去正火前:
[[ 0.14678337]
[ 0.01708614]
[-0.01448759]]
去正火后:
[ 141.49916077]
[ 15878.08398438]
[-12978583.]
万一它很重要(我认为它不)我使用Tensorflow 1.2。
任何想法?谢谢...
[主要编辑:
OK,根据意见,我已经修改了代码,如下所示:
import numpy as np
import tensorflow as tf
# Read the data
COLUMNS = ["url", "title_length", "article_length", "keywords", "shares", "shares2"]
data = np.genfromtxt("OnlineNewsPopularityNonLinear.csv", delimiter=',', names=COLUMNS)
x_raw = data["article_length"]
x_data = np.zeros([3, 100])
x_data[0] = x_raw/np.max(x_raw)
x_data[1] = x_raw**2/np.max(x_raw**2)
x_data[2] = x_raw**3/np.max(x_raw**3)
print(x_data)
w_set = np.zeros([1, 3])
w_set[0] = np.array([1, -1, 1])
print(w_set)
#y_data = np.matmul(w_set, x_data)
y_data = np.zeros([1, 100])
y_data[0] = data["shares"]/np.max(data["shares"])
print(y_data)
w = tf.Variable(np.zeros([1, 3]), dtype="float32", name="w")
b = tf.Variable(np.zeros([1]), dtype="float32", name="b")
X = tf.placeholder("float32", shape=[3, 100])
Y = tf.placeholder("float32", shape=[1, 100])
Ypred = tf.add(tf.matmul(w, X), b)
error = tf.reduce_mean(tf.squared_difference(Ypred, Y))
optimizer = tf.train.GradientDescentOptimizer(.01).minimize(error)
init = tf.global_variables_initializer()
# Create the session to run the algorith
with tf.Session() as session:
session.run(init)
# Run the algorithm
for i in range(5000000):
_, loss, Wcur = session.run([optimizer, error, w], feed_dict={X: x_data, Y: y_data})
if (i % 10000 == 0):
print (loss, Wcur)
当我人为地使数据按照适当的权重(1,-1,1),它工作正常。当我用“真实”的数据(你可以在这里找到:http://www.nicholaschase.com/OnlineNewsPopularityNonLinear.csv)似乎与
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
2.67402e-10 [[-0.00169705 0.00216109 0.99922621]]
来拉平“真实”数据与电子表格创建的,所以它应该只是准确的,不是吗?
谢谢...]
,不错误变得更小或更大? – stackoverflowuser2010
好问题;当我尝试,该行: actual_shares = tf.placeholder(“FLOAT32”) 给我 InvalidArgumentError(见上文回溯):你必须养活占位符张量“PLACEHOLDER_1”的值与D型浮动 这没有意义。 – NickChase
也许你的输入数据格式不正确。你应该仔细检查它。 – stackoverflowuser2010