2017-04-04 56 views
0

我们知道,在逻辑回归算法我们预测之一,当THETA次X是大于0.5。我想提高精度值。所以我想改变它的预测函数来预测1时THETA倍X是大于0.5大于0.7或其它值。如何改变时,对预测-一个参数sklearn?

如果我写的算法,我可以很容易地做到这一点。但随着sklearn包,我不知道该怎么做。

任何人都可以帮我一把吗?

为了清楚地足够解释的问题,这里是在八度的预测函数罗滕:

p = sigmoid(X*theta); 

for i=1:size(p)(1) 
    if p(i) >= 0.6 
     p(i) = 1; 
    else 
     p(i) = 0; 
    endif; 
endfor 

回答

0

从sklearn的LogisticRegression预测对象具有predict_proba方法,其输出与一个输入例如属于某一类的概率。您可以使用此功能与自己定义的THETA次X一起得到你想要的功能。

一个例子:

from sklearn import linear_model 
import numpy as np 

np.random.seed(1337) # Seed random for reproducibility 
X = np.random.random((10, 5)) # Create sample data 
Y = np.random.randint(2, size=10) 

lr = linear_model.LogisticRegression().fit(X, Y) 

prob_example_is_one = lr.predict_proba(X)[:, 1] 

my_theta_times_X = 0.7 # Our custom threshold 
predict_greater_than_theta = prob_example_is_one > my_theta_times_X 

下面是predict_proba文档字符串:

Probability estimates. 

The returned estimates for all classes are ordered by the 
label of classes. 

For a multi_class problem, if multi_class is set to be "multinomial" 
the softmax function is used to find the predicted probability of 
each class. 
Else use a one-vs-rest approach, i.e calculate the probability 
of each class assuming it to be positive using the logistic function. 
and normalize these values across all the classes. 

Parameters 
---------- 
X : array-like, shape = [n_samples, n_features] 

Returns 
------- 
T : array-like, shape = [n_samples, n_classes] 
    Returns the probability of the sample for each class in the model, 
    where classes are ordered as they are in ``self.classes_``. 
0

这个工程的二进制和多类分类:

from sklearn.linear_model import LogisticRegression 
import numpy as np 

#X = some training data 
#y = labels for training data 
#X_test = some test data 

clf = LogisticRegression() 
clf.fit(X, y) 

predictions = clf.predict_proba(X_test) 

predictions = clf.classes_[np.argmax(predictions > threshold, axis=1)]