2013-05-02 152 views
4

我想从我的sci-kit学习模型中预测y_train_actual的均方根误差与原始值salariesTypeError:不支持的操作数类型为 - :'numpy.ndarray'和'numpy.ndarray'

问题:但与mean_squared_error(y_train_actual, salaries),我收到错误TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'。作为第二个参数使用list(salaries)而不是salaries会产生相同的错误。

随着mean_squared_error(y_train_actual, y_valid_actual)我收到错误Found array with dim 40663. Expected 244768

我怎么能转换为正确的数组类型sklearn.netrucs.mean_squared_error()

代码

from sklearn.metrics import mean_squared_error 

y_train_actual = [ np.exp(float(row)) for row in y_train ] 
print mean_squared_error(y_train_actual, salaries) 

错误

TypeError         Traceback (most recent call last) 
<ipython-input-144-b6d4557ba9c5> in <module>() 
     3 y_valid_actual = [ np.exp(float(row)) for row in y_valid ] 
     4 
----> 5 print mean_squared_error(y_train_actual, salaries) 
     6 print mean_squared_error(y_train_actual, y_valid_actual) 

C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred) 
    1462  """ 
    1463  y_true, y_pred = check_arrays(y_true, y_pred) 
-> 1464  return np.mean((y_pred - y_true) ** 2) 
    1465 
    1466 

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray' 

代码

y_train_actual = [ np.exp(float(row)) for row in y_train ] 
y_valid_actual = [ np.exp(float(row)) for row in y_valid ] 

print mean_squared_error(y_train_actual, y_valid_actual) 

错误

ValueError        Traceback (most recent call last) 
<ipython-input-146-7fcd0367c6f1> in <module>() 
     4 
     5 #print mean_squared_error(y_train_actual, salaries) 
----> 6 print mean_squared_error(y_train_actual, y_valid_actual) 

C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred) 
    1461 
    1462  """ 
-> 1463  y_true, y_pred = check_arrays(y_true, y_pred) 
    1464  return np.mean((y_pred - y_true) ** 2) 
    1465 

C:\Python27\lib\site-packages\sklearn\utils\validation.pyc in check_arrays(*arrays, **options) 
    191   if size != n_samples: 
    192    raise ValueError("Found array with dim %d. Expected %d" 
--> 193        % (size, n_samples)) 
    194 
    195   if not allow_lists or hasattr(array, "shape"): 

ValueError: Found array with dim 40663. Expected 244768 

代码

print type(y_train) 
print type(y_train_actual) 
print type(salaries) 

结果

<type 'list'> 
<type 'list'> 
<type 'tuple'> 

打印y_train [:10]

[10.126631103850338, 10.308952660644293, 10.308952660644293, 10.221941283654663, 10.126631103850338, 10.126631103850338, 11.225243392518447, 9.9987977323404529, 10.043249494911286, 11.350406535472453]

打印薪金[:10]

('25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000')

打印列表(工资)[:10]

['25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000']

打印文件N(y_train)

244768 

打印LEN(工资)

244768 
+0

你可以添加y_train的“形状”吗?我的猜测是,y_train_actual是'ndarrays'的'list',它可能在'mean_square_error()'内发生冲突。 – fgb 2013-05-02 04:39:18

+0

@fgb我得到错误'AttributeError:'列表'对象没有属性'shape'' – Nyxynyx 2013-05-02 04:41:16

+0

没错。你有关于y_train的尺寸的想法吗? – fgb 2013-05-02 04:42:38

回答

9

TypeError问题从薪金是字符串的列表而y_train_actual是浮筒的列表茎。那些不能被减去。

对于你的第二个错误,你应该确保两个数组的大小相同,否则它不能减去它们。

+0

我试过你的建议,并得到错误'float()参数必须是一个字符串或数字' – Nyxynyx 2013-05-02 04:51:00

+1

你使用'np.float()',它的行为'numpy.ndarrays'? – fgb 2013-05-02 04:51:37

+0

是我使用'np.float()'而不是'float()' – Nyxynyx 2013-05-02 04:52:21

相关问题