2016-04-30 140 views
1

我已经使用bigml.com生成了虹膜数据集的决策树模型。我已经将此决策树模型下载为PMML,并且希望将其用于本地计算机中的预测。从bigml如何使用下载的bigml模型进行本地预测?

<?xml version="1.0" encoding="utf-8"?> 
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <Header description="Generated by BigML"/> 
    <DataDictionary> 
     <DataField dataType="double" displayName="Sepal length" name="000001" optype="continuous"/> 
     <DataField dataType="double" displayName="Sepal width" name="000002" optype="continuous"/> 
     <DataField dataType="double" displayName="Petal length" name="000003" optype="continuous"/> 
     <DataField dataType="double" displayName="Petal width" name="000004" optype="continuous"/> 
     <DataField dataType="string" displayName="Species" name="000005" optype="categorical"> 
      <Value value="Iris-setosa"/> 
      <Value value="Iris-versicolor"/> 
      <Value value="Iris-virginica"/> 
     </DataField> 
    </DataDictionary> 
    <TreeModel algorithmName="mtree" functionName="classification" modelName=""> 
     <MiningSchema> 
      <MiningField name="000001"/> 
      <MiningField name="000002"/> 
      <MiningField name="000003"/> 
      <MiningField name="000004"/> 
      <MiningField name="000005" usageType="target"/> 
     </MiningSchema> 
     <Node recordCount="150" score="Iris-setosa"> 
      <True/> 
      <ScoreDistribution recordCount="50" value="Iris-setosa"/> 
      <ScoreDistribution recordCount="50" value="Iris-versicolor"/> 
      <ScoreDistribution recordCount="50" value="Iris-virginica"/> 
      <Node recordCount="100" score="Iris-versicolor"> 
       <SimplePredicate field="000003" operator="greaterThan" value="2.45"/> 
       <ScoreDistribution recordCount="50" value="Iris-versicolor"/> 
       <ScoreDistribution recordCount="50" value="Iris-virginica"/> 
       <Node recordCount="46" score="Iris-virginica"> 
        <SimplePredicate field="000004" operator="greaterThan" value="1.75"/> 
        <ScoreDistribution recordCount="45" value="Iris-virginica"/> 
        <ScoreDistribution recordCount="1" value="Iris-versicolor"/> 
        <Node recordCount="43" score="Iris-virginica"> 
         <SimplePredicate field="000003" operator="greaterThan" value="4.85"/> 
         <ScoreDistribution recordCount="43" value="Iris-virginica"/> 
        </Node> 
        <Node recordCount="3" score="Iris-virginica"> 
         <SimplePredicate field="000003" operator="lessOrEqual" value="4.85"/> 
         <ScoreDistribution recordCount="2" value="Iris-virginica"/> 
         <ScoreDistribution recordCount="1" value="Iris-versicolor"/> 
         <Node recordCount="1" score="Iris-versicolor"> 
          <SimplePredicate field="000002" operator="greaterThan" value="3.1"/> 
          <ScoreDistribution recordCount="1" value="Iris-versicolor"/> 
         </Node> 
         <Node recordCount="2" score="Iris-virginica"> 
          <SimplePredicate field="000002" operator="lessOrEqual" value="3.1"/> 
          <ScoreDistribution recordCount="2" value="Iris-virginica"/> 
         </Node> 
        </Node> 
       </Node> 
       <Node recordCount="54" score="Iris-versicolor"> 
        <SimplePredicate field="000004" operator="lessOrEqual" value="1.75"/> 
        <ScoreDistribution recordCount="49" value="Iris-versicolor"/> 
        <ScoreDistribution recordCount="5" value="Iris-virginica"/> 
        <Node recordCount="6" score="Iris-virginica"> 
         <SimplePredicate field="000003" operator="greaterThan" value="4.95"/> 
         <ScoreDistribution recordCount="4" value="Iris-virginica"/> 
         <ScoreDistribution recordCount="2" value="Iris-versicolor"/> 
         <Node recordCount="3" score="Iris-versicolor"> 
          <SimplePredicate field="000004" operator="greaterThan" value="1.55"/> 
          <ScoreDistribution recordCount="2" value="Iris-versicolor"/> 
          <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
          <Node recordCount="1" score="Iris-virginica"> 
           <SimplePredicate field="000003" operator="greaterThan" value="5.45"/> 
           <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
          </Node> 
          <Node recordCount="2" score="Iris-versicolor"> 
           <SimplePredicate field="000003" operator="lessOrEqual" value="5.45"/> 
           <ScoreDistribution recordCount="2" value="Iris-versicolor"/> 
          </Node> 
         </Node> 
         <Node recordCount="3" score="Iris-virginica"> 
          <SimplePredicate field="000004" operator="lessOrEqual" value="1.55"/> 
          <ScoreDistribution recordCount="3" value="Iris-virginica"/> 
         </Node> 
        </Node> 
        <Node recordCount="48" score="Iris-versicolor"> 
         <SimplePredicate field="000003" operator="lessOrEqual" value="4.95"/> 
         <ScoreDistribution recordCount="47" value="Iris-versicolor"/> 
         <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
         <Node recordCount="1" score="Iris-virginica"> 
          <SimplePredicate field="000004" operator="greaterThan" value="1.65"/> 
          <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
         </Node> 
         <Node recordCount="47" score="Iris-versicolor"> 
          <SimplePredicate field="000004" operator="lessOrEqual" value="1.65"/> 
          <ScoreDistribution recordCount="47" value="Iris-versicolor"/> 
         </Node> 
        </Node> 
       </Node> 
      </Node> 
      <Node recordCount="50" score="Iris-setosa"> 
       <SimplePredicate field="000003" operator="lessOrEqual" value="2.45"/> 
       <ScoreDistribution recordCount="50" value="Iris-setosa"/> 
      </Node> 
     </Node> 
    </TreeModel> 
</PMML> 

我一般用R进行机器学习,并希望加载和我的系统中使用该模型预测

PMML模型。 R本身有一个pmml包,但它似乎不可能use it for prediction。有没有其他方法可以在R中使用此PMML模型进行预测。如果不可能,可以将此PMML模型与其他语言(如python或weka)一起使用吗?如果是的话,我该怎么做(代码需要)。从bigml

def predict_species(sepal_width=None, 
        petal_length=None, 
        petal_width=None): 
    """ Predictor for Species from 

     This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic 
     in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes 
     of 50 instances each, where each class refers to a type of iris plant. 
     Source 
     Iris Data Set[*] 
     Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository[*]. Irvine, CA: University of California, School of Information and Computer Science. 

     [*]Iris Data Set: http://archive.ics.uci.edu/ml/datasets/Iris 
     [*]UCI Machine Learning Repository: http://archive.ics.uci.edu/ml 
    """ 
    if (petal_length is None): 
     return u'Iris-setosa' 
    if (petal_length > 2.45): 
     if (petal_width is None): 
      return u'Iris-versicolor' 
     if (petal_width > 1.75): 
      if (petal_length > 4.85): 
       return u'Iris-virginica' 
      if (petal_length <= 4.85): 
       if (sepal_width is None): 
        return u'Iris-virginica' 
       if (sepal_width > 3.1): 
        return u'Iris-versicolor' 
       if (sepal_width <= 3.1): 
        return u'Iris-virginica' 
     if (petal_width <= 1.75): 
      if (petal_length > 4.95): 
       if (petal_width > 1.55): 
        if (petal_length > 5.45): 
         return u'Iris-virginica' 
        if (petal_length <= 5.45): 
         return u'Iris-versicolor' 
       if (petal_width <= 1.55): 
        return u'Iris-virginica' 
      if (petal_length <= 4.95): 
       if (petal_width > 1.65): 
        return u'Iris-virginica' 
       if (petal_width <= 1.65): 
        return u'Iris-versicolor' 
    if (petal_length <= 2.45): 
     return u'Iris-setosa' 

回答

2

最简单的方法

蟒模型来执行本地预测的结果与BigML只是经由API调用直接下载模型(合奏,群集异常检测器等)。

例如,使用BigML's Python Bindings的分类或回归模型,你会做这样的事情:

from bigml.model import Model 
model = Model('model/570f4b6e84622c5ed10095a9') 
model.predict({'feature_1': 1, 'feature_2': 2}) 

要使用本地集群找到最接近的质心:

from bigml.cluster import Cluster 
cluster = Cluster('cluster/572500b849c4a15c9d00451f') 
cluster.centroid({'feature_1': 1, 'feature_2': 2}) 

要使用一个本地异常检测器来评分一个新的数据点:

from bigml.anomaly import Anomaly 
anomaly_detector = Anomaly('anomaly/570f4c333bbd21090101e79f') 
anomaly_detector.anomaly_score({'feature_1': 1, 'feature_2': 2}) 

T上面的类(模型,集群和异常)将下载定义每个模型的JSON PML代码,并将其更改为本地函数(在本例中为python)。由于您可能不想使用R来实现真实世界的应用程序,因此最好使用您将用于应用程序的语言执行预测:python,node.js,java等。BigML提供了开放式的应用程序,所有这些源绑定。

相关问题