我想使用SVC分类器绘制学习曲线。数据集有点偏斜,大小约为150,1000,1000,1000和150。我遇到的问题与拟合估算:scikit-learn:使用SVC构建学习曲线
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/learning_curve.py", line 135, in learning_curve
for train, test in cv for n_train_samples in train_sizes_abs)
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 644, in __call__
self.dispatch(function, args, kwargs)
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 391, in dispatch
job = ImmediateApply(func, args, kwargs)
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 129, in __init__
self.results = func(*args, **kwargs)
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1233, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/svm/base.py", line 140, in fit
X = atleast2d_or_csr(X, dtype=np.float64, order='C')
File "/Users/carrier24sg/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/svm/base.py", line 450, in _validate_targets
% len(cls))
ValueError: The number of classes has to be greater than one; got 1
我的代码
df = pd.read_csv('../resources/problem2_processed_validate.csv')
data, label = preprocess_text(df)
cv = StratifiedKFold(label, 10)
plt = plot_learning_curve(estimator=SVC(), title="Learning curve", X=data, y=label.values, cv
train_sizes, train_scores, test_scores = learning_curve(
estimator, data, y=label, cv=cv, train_sizes=np.linspace(.1, 1.0, 5))
即使我使用分层抽样,我仍然会碰到这个错误。我相信它是因为学习曲线代码在增加数据集大小时不会执行分层,而且我已经在一个步骤中获得了所有类似的类标签。
我应该如何解决?
谢谢!我会看看我能否贡献。 – goh 2014-10-08 09:05:05