2017-01-16 131 views
1

我想使用MinMaxScalersklearn.preprocessing来标准化训练和测试数据集。但是,该包似乎并未接受我的测试数据集。Python ValueError:具有形状(124,1)的非广播输出操作数与广播形状不匹配(124,13)

import pandas as pd 
import numpy as np 

# Read in data. 
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', 
         header=None) 
df_wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash', 
        'Alcalinity of ash', 'Magnesium', 'Total phenols', 
        'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 
        'Color intensity', 'Hue', 'OD280/OD315 of diluted wines', 
        'Proline'] 

# Split into train/test data. 
from sklearn.model_selection import train_test_split 
X = df_wine.iloc[:, 1:].values 
y = df_wine.iloc[:, 0].values 
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, 
                random_state = 0) 

# Normalize features using min-max scaling. 
from sklearn.preprocessing import MinMaxScaler 
mms = MinMaxScaler() 
X_train_norm = mms.fit_transform(X_train) 
X_test_norm = mms.transform(X_test) 

当执行这一点,我得到一个DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.ValueError: operands could not be broadcast together with shapes (124,) (13,) (124,)一起。

重塑数据仍然会产生错误。

X_test_norm = mms.transform(X_test.reshape(-1, 1)) 

该整形产生了一个错误ValueError: non-broadcastable output operand with shape (124,1) doesn't match the broadcast shape (124,13)

有关如何解决此错误的任何输入将会有所帮助。

+0

当你有形状误差,你需要做的第一件事是显示进入您的问题在所有阵列的形状这种情况下'X_train'和'X_test',可能会更多。 – hpaulj

回答

1

列车/测试数据的分区必须按照与train_test_split()函数的输入数组相同的顺序来指定,以便按照该顺序对它们进行解压缩。

显然,当该命令被指定为X_train, y_train, X_test, y_test,的y_trainlen(y_train)=54)和X_testlen(X_test)=124)所得的形状得到了交换导致ValueError

相反,你必须:

# Split into train/test data. 
#     _________________________________ 
#     |  |      \ 
#     |  |       \ 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)           
# |   |         /
# |__________|_____________________________________/ 
# (or) 
# y_train, y_test, X_train, X_test = train_test_split(y, X, test_size=0.3, random_state=0) 

# Normalize features using min-max scaling. 
from sklearn.preprocessing import MinMaxScaler 
mms = MinMaxScaler() 
X_train_norm = mms.fit_transform(X_train) 
X_test_norm = mms.transform(X_test) 

生产:

X_train_norm[0] 
array([ 0.72043011, 0.20378151, 0.53763441, 0.30927835, 0.33695652, 
     0.54316547, 0.73700306, 0.25  , 0.40189873, 0.24068768, 
     0.48717949, 1.  , 0.5854251 ]) 

X_test_norm[0] 
array([ 0.72849462, 0.16386555, 0.47849462, 0.29896907, 0.52173913, 
     0.53956835, 0.74311927, 0.13461538, 0.37974684, 0.4364852 , 
     0.32478632, 0.70695971, 0.60566802]) 
+0

因此,他正在对13个功能集进行培训,并对1个功能集进行测试。这说明了不寻常的错误信息。 sklearn问题中的形状错误很常见,但不涉及“非广播”问题。 – hpaulj

相关问题