如何从Java调用scikit-learn分类器？

31

不能用Jython作为scikit学习在很大程度上依赖于numpy的和SciPy的有许多编译C和Fortran扩展，因此可以用Jython无法工作。

使用最简单的方法在Java环境scikit学习将是：

使用microframework暴露分类为HTTP/Json的服务，例如，如flask或bottle或cornice和使用HTTP客户端库
在python写一个命令行包装应用的是使用一些格式诸如CSV或JSON（或一些较低级别的二进制表示）在stdout上stdin和输出预测读取数据从Java调用它，并调用python程序f rom java例如使用Apache Commons Exec。
使python程序输出在拟合时学习的原始数值参数（通常作为浮点值的数组），并重新实现java中的预测函数（对于预测通常只是预测线性模型一个阈值点积）。

如果您还需要在Java中重新实现特征提取，最后一种方法将会有更多的工作。

最后，你可以使用Java库，如Weka的或亨利马乌实现你需要的，而不是试图用从Java scikit学习算法。

来源

2012-10-05 09:05:29 ogrisel

+2

我的一位同事刚刚建议Jepp ...是否会为此付出努力？ –

+0

也许，我不知道jepp。它确实看起来适合这项任务。 – ogrisel

+0

对于一个网络应用程序，我个人更喜欢http曝光方法。然后，@ user939259可以为各种应用程序使用分类池并更轻松地进行扩展（根据需求调整池大小）。我只考虑Jepp的桌面应用程序。尽管我喜欢python，但除非scikit-lear比Weka或Mahout有更好的表现，否则我会选择单一语言的解决方案。拥有多种语言/框架应被视为技术性债务。 – rbanffy

13

有这个目的JPMML项目。首先，您可以直接使用sklearn2pmml库将pyikit-learn模型序列化为PMML（内部是XML），或者先使用python将其转储为python，然后使用java中的jpmml-sklearn或该库提供的命令行进行转换。接下来，你可以加载PMML文件，反序列化，并在Java代码中使用jpmml-evaluator执行加载的模型。

这种工作方式与不是所有scikit学习机型，但与他们的many。

来源

2016-08-10 16:31:57

+0

如何确保功能转换部分在用于培训的Python和用Java完成的功能（使用pmml）之间保持一致？ –

1

下面是JPMML解决一些代码：

--PYTHON PART--

# helper function to determine the string columns which have to be one-hot-encoded in order to apply an estimator. 
def determine_categorical_columns(df): 
    categorical_columns = [] 
    x = 0 
    for col in df.dtypes: 
     if col == 'object': 
      val = df[df.columns[x]].iloc[0] 
      if not isinstance(val,Decimal): 
       categorical_columns.append(df.columns[x]) 
     x += 1 
    return categorical_columns 

categorical_columns = determine_categorical_columns(df) 
other_columns = list(set(df.columns).difference(categorical_columns)) 


#construction of transformators for our example 
labelBinarizers = [(d, LabelBinarizer()) for d in categorical_columns] 
nones = [(d, None) for d in other_columns] 
transformators = labelBinarizers+nones 

mapper = DataFrameMapper(transformators,df_out=True) 
gbc = GradientBoostingClassifier() 

#construction of the pipeline 
lm = PMMLPipeline([ 
    ("mapper", mapper), 
    ("estimator", gbc) 
])

--java PART -

//Initialisation. 
String pmmlFile = "ScikitLearnNew.pmml"; 
PMML pmml = org.jpmml.model.PMMLUtil.unmarshal(new FileInputStream(pmmlFile)); 
ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance(); 
MiningModelEvaluator evaluator = (MiningModelEvaluator) modelEvaluatorFactory.newModelEvaluator(pmml); 

//Determine which features are required as input 
HashMap<String, Field>() inputFieldMap = new HashMap<String, Field>(); 
for (int i = 0; i < evaluator.getInputFields().size();i++) { 
    InputField curInputField = evaluator.getInputFields().get(i); 
    String fieldName = curInputField.getName().getValue(); 
    inputFieldMap.put(fieldName.toLowerCase(),curInputField.getField()); 
} 


//prediction 

HashMap<String,String> argsMap = new HashMap<String,String>(); 
//... fill argsMap with input 

Map<FieldName, ?> res; 
// here we keep only features that are required by the model 
Map<FieldName,String> args = new HashMap<FieldName, String>(); 
Iterator<String> iter = argsMap.keySet().iterator(); 
while (iter.hasNext()) { 
    String key = iter.next(); 
    Field f = inputFieldMap.get(key); 
    if (f != null) { 
    FieldName name =f.getName(); 
    String value = argsMap.get(key); 
    args.put(name, value); 
    } 
} 
//the model is applied to input, a probability distribution is obtained 
res = evaluator.evaluate(args); 
SegmentResult segmentResult = (SegmentResult) res; 
Object targetValue = segmentResult.getTargetValue(); 
ProbabilityDistribution probabilityDistribution = (ProbabilityDistribution) targetValue;

来源

2018-02-16 13:05:25 Volokh

1

您可以使用一个搬运工，我已经测试了sklearn-porter（https://github.com/nok/sklearn-porter），并且它适用于Java。

我的代码如下：

import pandas as pd 
from sklearn import tree 
from sklearn_porter import Porter 

train_dataset = pd.read_csv('./result2.csv').as_matrix() 

X_train = train_dataset[:90, :8] 
Y_train = train_dataset[:90, 8:] 

X_test = train_dataset[90:, :8] 
Y_test = train_dataset[90:, 8:] 

print X_train.shape 
print Y_train.shape 


clf = tree.DecisionTreeClassifier() 
clf = clf.fit(X_train, Y_train) 

porter = Porter(clf, language='java') 
output = porter.export(embed_data=True) 
print(output)

就我而言，我使用的是DecisionTreeClassifier，和

打印输出（输出）

是以下代码作为控制台中的文本：

class DecisionTreeClassifier { 

    private static int findMax(int[] nums) { 
    int index = 0; 
    for (int i = 0; i < nums.length; i++) { 
     index = nums[i] > nums[index] ? i : index; 
    } 
    return index; 
    } 


    public static int predict(double[] features) { 
    int[] classes = new int[2]; 

    if (features[5] <= 51.5) { 
     if (features[6] <= 21.0) { 

      // HUGE amount of ifs.......... 

     } 
    } 

    return findMax(classes); 
    } 

    public static void main(String[] args) { 
    if (args.length == 8) { 

     // Features: 
     double[] features = new double[args.length]; 
     for (int i = 0, l = args.length; i < l; i++) { 
      features[i] = Double.parseDouble(args[i]); 
     } 

     // Prediction: 
     int prediction = DecisionTreeClassifier.predict(features); 
     System.out.println(prediction); 

    } 
    } 
}

来源

2018-03-04 16:26:02 gustavoresque

如何从Java调用scikit-learn分类器？

回答

相关问题