2017-06-14 91 views
0

我是该领域的初学者,试图按照逻辑回归对数据集进行建模。代码如下:发现输入变量的样本数不一致[100,300]

import numpy as np 
from matplotlib import pyplot as plt 
import pandas as pnd 
from sklearn.preprocessing import Imputer, LabelEncoder, OneHotEncoder, StandardScaler 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import confusion_matrix 

# Import the dataset 
data_set = pnd.read_csv("/Users/Siddharth/PycharmProjects/Deep_Learning/Classification Template/Social_Network_Ads.csv") 
X = data_set.iloc[:, [2,3]].values 
Y = data_set.iloc[:, 4].values 

# Splitting the set into training set and testing set 
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=0) 

# Scaling the variables 
scaler_x = StandardScaler() 
x_train = scaler_x.fit_transform(x_train) 
x_train = scaler_x.transform(x_test) 

# Fitting Linear Regression to training data 
classifier = LogisticRegression(random_state=0) 
classifier.fit(x_train, y_train) 

# Predicting the test set results 
y_prediction = classifier.predict(x_test) 

# Making the confusion matrix 
conMat = confusion_matrix(y_true=y_test, y_pred=y_prediction) 
print(conMat) 

我得到的错误是在classifier.fit(x_train, y_train)。 错误是:

Traceback (most recent call last): 
    File "/Users/Siddharth/PycharmProjects/Deep_Learning/Logistic_regression.py", line 24, in <module> 
    classifier.fit(x_train, y_train) 
    File "/usr/local/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1173, in fit 
    order="C") 
    File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y 
    check_consistent_length(X, y) 
    File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length 
    " samples: %r" % [int(l) for l in lengths]) 
ValueError: Found input variables with inconsistent numbers of samples: [100, 300] 

我不知道为什么会这样。任何帮助将不胜感激。 谢谢!

回答

1

好像你在这里有一个错字。您可能需要:

x_test = scaler_x.transform(x_test) 

而不是:x_train = scaler_x.transform(x_test)。总之,错误基本上说你的x_train(这实际上是x_test)和y_train的大小不匹配。

相关问题