2013-03-17 63 views
1

这是我第二篇关于weka使用情况的帖子(第一篇帖子是here)。我成功地使用TextDirectoryLoader为Weka提供了培训和样本测试数据。很棒。现在我想将它移到生产环境中,所以要从MySQL表中检索要分类的数据。这是我如何做它:Weka来自MySql数据库的培训数据

TextDirectoryLoader loader = new TextDirectoryLoader(); 
    loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/training-data")); 
    Instances dataRaw = loader.getDataSet(); 

    StringToWordVector filter = new StringToWordVector(); 
    filter.setInputFormat(dataRaw); 
    Instances dataTraining = Filter.useFilter(dataRaw, filter); 

    // Create test data instances[this works, but the sample data now needs to come frm the db instead, see below] 
    //loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data")); 
    //dataRaw = loader.getDataSet(); 
    //Instances dataTest = Filter.useFilter(dataRaw, filter); 

    InstanceQuery query = new InstanceQuery(); 
    query.setUsername("myusername"); 
    query.setPassword("mypassword"); 
    String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1"; 
    query.setQuery(sql); 
    Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter); 

    // Classify 
    J48 model = new J48(); 
    model.buildClassifier(dataTraining); 

    for (int i = 0; i < dataTest.numInstances(); i++) { 
      dataTest.instance(i).setClassMissing(); 
      double cls = model.classifyInstance(dataTest.instance(i)); 
      dataTest.instance(i).setClassValue(cls); 
      System.out.println(cls + " -> " + dataTest.instance(i).classAttribute().value((int) cls)); 

    } 

不幸的是这是行不通的,秧鸡意外停止在这条线:

Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter); 

所以我想我的问题是如何改造这部分

// Create test data instances[this works, but the sample data now needs to come frm the db instead, see below] 
//loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data")); 
//dataRaw = loader.getDataSet(); 
//Instances dataTest = Filter.useFilter(dataRaw, filter); 

到SQL基于数据

InstanceQuery query = new InstanceQuery(); 
query.setUsername("myusername"); 
query.setPassword("mypassword"); 
String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1"; 
query.setQuery(sql); 
Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter); 

请注意,数据库连接没有问题,我确实获得了正确数量的实例。

欣赏帮助,非常接近。

+1

weka停止“意外”的堆栈跟踪是什么?你调查了'query.retrieveInstances()'的输出吗? – 2013-03-20 10:46:12

+0

你确定你的SQL:'SELECT d.desc FROM deals d WHERE d.j48 = 1'?我会期望像'SELECT d.desc FROM deal AS d WHERE d.j48 = 1'。 – 2013-03-21 09:39:11

+0

@JanEglinger试图添加AS但没有运气,我检查了query.retrieveInstances()的错误,它的o =(java.lang.ArrayIndexOutOfBoundsException)java.lang.ArrayIndexOutOfBoundsException:1 – 2013-03-25 21:51:31

回答

0

您的代码使用TextDirectoryLoader类,它基于Arff Files from Text Collections。根据他们的帮助文件

"Loads all text files in a directory and 
uses the subdirectory names as class labels. 
The content of the text files will be stored in a String attribute, 
the filename can be stored as well." 

参见以下code

double[] newInst = new double[2]; 
newInst[0] = (double)data.attribute(0).addStringValue(files[i]); 
.... 
newInst[1] = (double)data.attribute(1).addStringValue(txtStr.toString()); 
data.add(new Instance(1.0, newInst)); 

正如你可以看到这个代码,希望2个属性值添加数据集。但是你的sql只提供一个属性。

String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1"; 
代码newinst中 1部分

这可能是原因,你们的问题 “(java.lang.ArrayIndexOutOfBoundsException)”。 Weka找不到第二个属性。

-1

我非常喜欢自己的初学者,但为防万一它有用,你知道有一个DatabaseLoader类和一个DatabaseConverter接口?

+0

你应该解释这些类和接口如何解决这个问题。 – ChrisF 2013-07-18 12:57:05