我有一段艰难的时间试图围绕Lucene库包裹我的头。这是我到目前为止:如何使用Lucene库来提取n-gram?
public void shingleMe()
{
try
{
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
FileReader reader = new FileReader("test.txt");
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(analyzer, 2);
shingleAnalyzer.setOutputUnigrams(false);
TokenStream stream = shingleAnalyzer.tokenStream("contents", reader);
CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);
while (stream.incrementToken())
{
System.out.println(charTermAttribute.toString());
}
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
它在stream.incrementToken()失败。我的理解是ShingleAnalyzerWrapper使用另一个分析器来创建一个木瓦分析器对象。从那里,我将它转换为令牌流,然后使用属性过滤器进行分析。然而,它始终会导致此异常:
异常线程 “main” java.lang.AbstractMethodError:org.apache.lucene.analysis.TokenStream.incrementToken()z
的思考?提前致谢!
单词或字符ngrams? – Reactormonk 2012-04-01 12:35:08