2010-04-20 91 views
0

我使用PYML来构造一个多类线性支持向量机(SVM)。在训练SVM后,我希望能够保存分类器,以便在后续运行中,我可以立即使用分类器,而无需再次进行再培训。不幸的是,.save()函数不是该分类实施,并试图腌它(这两个标准的泡菜和cPickle的)产生以下错误信息:保存PyML.classifiers.multi.OneAgainstRest(SVM())对象吗?

 
pickle.PicklingError: Can't pickle : it's not found as __builtin__.PySwigObject 

有谁知道解决的办法或没有这个问题的替代库?谢谢。

编辑/更新
我现在的训练,并试图挽救分类用下面的代码:

 
mc = multi.OneAgainstRest(SVM()); 
mc.train(dataset_pyml,saveSpace=False); 
    for i, classifier in enumerate(mc.classifiers): 
     filename=os.path.join(prefix,labels[i]+".svm"); 
     classifier.save(filename); 

请注意,我现在节约与PyML保存机制,而不是用酸洗,和我已经将“saveSpace = False”传递给了训练函数。不过,我还是流汗的错误:

 
ValueError: in order to save a dataset you need to train as: s.train(data, saveSpace = False) 

不过,我路过saveSpace =假...所以,我怎么救分类(S)?

P.S.
我正在使用这个项目是pyimgattr,如果你想要一个完整的可测试的例子...该程序运行“./pyimgattr.py火车”......这会得到你这个错误。此外,在版本信息记:

 
[[email protected] /Volumes/Storage/classes/cse559/pyimgattr]$ python 
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import PyML 
>>> print PyML.__version__ 
0.7.0 

回答

0

获取更新版本的PyML。从版本0.7.4开始,可以保存OneAgainstRest分类器(使用.save()和.load());在该版本之前,保存/加载分类器是非常重要且容易出错的。

2

在上线96 multi.py “self.classifiers [I] .train(达泰)” 被称为没有通过 “** ARGS”,所以,如果你打电话“mc.train(data,saveSpace = False)”,这个saveSpace-Argument会丢失。这就是为什么如果您尝试单独保存多类分类器中的分类器时出现错误消息的原因。但是,如果您更改此行以通过所有参数,则可以单独保存每个分类器:

#!/usr/bin/python 

import numpy 

from PyML.utils import misc 
from PyML.evaluators import assess 
from PyML.classifiers.svm import SVM, loadSVM 
from PyML.containers.labels import oneAgainstRest 
from PyML.classifiers.baseClassifiers import Classifier 
from PyML.containers.vectorDatasets import SparseDataSet 
from PyML.classifiers.composite import CompositeClassifier 

class OneAgainstRestFixed(CompositeClassifier) : 

    '''A one-against-the-rest multi-class classifier''' 

    def train(self, data, **args) : 
     '''train k classifiers''' 

     Classifier.train(self, data, **args) 

     numClasses = self.labels.numClasses 
     if numClasses <= 2: 
      raise ValueError, 'Not a multi class problem' 

     self.classifiers = [self.classifier.__class__(self.classifier) 
          for i in range(numClasses)] 

     for i in range(numClasses) : 
      # make a copy of the data; this is done in case the classifier modifies the data 
      datai = data.__class__(data, deepcopy = self.classifier.deepcopy) 
      datai = oneAgainstRest(datai, data.labels.classLabels[i]) 

      self.classifiers[i].train(datai, **args) 

     self.log.trainingTime = self.getTrainingTime() 

    def classify(self, data, i): 

     r = numpy.zeros(self.labels.numClasses, numpy.float_) 
     for j in range(self.labels.numClasses) : 
      r[j] = self.classifiers[j].decisionFunc(data, i) 

     return numpy.argmax(r), numpy.max(r) 

    def preproject(self, data) : 

     for i in range(self.labels.numClasses) : 
      self.classifiers[i].preproject(data) 

    test = assess.test 

train_data = """ 
0 1:1.0 2:0.0 3:0.0 4:0.0 
0 1:0.9 2:0.0 3:0.0 4:0.0 
1 1:0.0 2:1.0 3:0.0 4:0.0 
1 1:0.0 2:0.8 3:0.0 4:0.0 
2 1:0.0 2:0.0 3:1.0 4:0.0 
2 1:0.0 2:0.0 3:0.9 4:0.0 
3 1:0.0 2:0.0 3:0.0 4:1.0 
3 1:0.0 2:0.0 3:0.0 4:0.9 
""" 
file("foo_train.data", "w").write(train_data.lstrip()) 

test_data = """ 
0 1:1.1 2:0.0 3:0.0 4:0.0 
1 1:0.0 2:1.2 3:0.0 4:0.0 
2 1:0.0 2:0.0 3:0.6 4:0.0 
3 1:0.0 2:0.0 3:0.0 4:1.4 
""" 
file("foo_test.data", "w").write(test_data.lstrip()) 

train = SparseDataSet("foo_train.data") 
mc = OneAgainstRestFixed(SVM()) 
mc.train(train, saveSpace=False) 

test = SparseDataSet("foo_test.data") 
print [mc.classify(test, i) for i in range(4)] 

for i, classifier in enumerate(mc.classifiers): 
    classifier.save("foo.model.%d" % i) 

classifiers = [] 
for i in range(4): 
    classifiers.append(loadSVM("foo.model.%d" % i)) 

mcnew = OneAgainstRestFixed(SVM()) 
mcnew.labels = misc.Container() 
mcnew.labels.addAttributes(test.labels, ['numClasses', 'classLabels']) 
mcnew.classifiers = classifiers 
print [mcnew.classify(test, i) for i in range(4)] 
+0

@ephes,对不起,你能澄清一点点吗?我应该通过什么来训练; saveSpace = True或saveSpace = False?此外,如何加载分类器...如果我按照您的建议逐个加载它们,如何将它们放回到单个多分类器中? – 2010-04-20 23:39:00

+0

saveSpace = False(奇怪的东西......)PyMLs抽象真的是漏洞。 好吧,我改变了示例源重新读取模型,并构建一个新的多类分类器,并重新计算测试数据的分数。 – ephes 2010-04-21 11:45:48

+0

谢谢。看起来您的OneAgainstRestFixed与原始O​​neAgainstRest完全相同,只不过您使用“self.classifiers [i] .train(datai,** args)”,而原始意外地忽略了“** args”参数。事情现在正在节省,但加载工作不正常。我将为此创建一个后续问题。 – 2010-04-22 02:17:58