2012-07-11 103 views
1

我读过关于在Lucene的突出的搜索字词一些教程,并用一块像这样的代码上来:如何在pyLucene中使用荧光笔?

(...) 
query = parser.parse(query_string) 

for scoreDoc in searcher.search(query, 50).scoreDocs: 
    doc = searcher.doc(scoreDoc.doc) 
    filename = doc.get("filename") 
    print filename 
    found_paraghaph = fetch_from_my_text_library(filename) 

    stream = lucene.TokenSources.getTokenStream("contents", found_paraghaph, analyzer); 
    scorer = lucene.Scorer(query, "contents", lucene.CachingTokenFilter(stream)) 
    highligter = lucene.Highligter(scorer) 
    fragment = highligter.getBestFragment(analyzer, "contents", found_paraghaph) 
    print '>>>' + fragment 

但是这一切都以错误结束:

Traceback (most recent call last): 
    File "./search.py", line 76, in <module> 
    scorer = lucene.Scorer(query, "contents", lucene.CachingTokenFilter(stream)) 
NotImplementedError: ('instantiating java class', <type 'Scorer'>) 

所以,我猜测,这部分Lucene并没有在pyLucene中实现。有没有其他方法可以做到这一点?

回答

4

我也有类似的错误。我认为这个类的包装器尚未在Pylucene v3.6中实现。

你可能想尝试以下操作:

analyzer = StandardAnalyzer(Version.LUCENE_CURRENT) 

# Constructs a query parser. 
queryParser = QueryParser(Version.LUCENE_CURRENT, FIELD_CONTENTS, analyzer) 

# Create a query 
query = queryParser.parse(QUERY_STRING) 

topDocs = searcher.search(query, 50) 

# Get top hits 
scoreDocs = topDocs.scoreDocs 
print "%s total matching documents." % len(scoreDocs) 

HighlightFormatter = SimpleHTMLFormatter(); 
highlighter = Highlighter(HighlightFormatter, QueryScorer (query)) 

for scoreDoc in scoreDocs: 
    doc = searcher.doc(scoreDoc.doc) 
    text = doc.get(FIELD_CONTENTS) 
    ts = analyzer.tokenStream(FIELD_CONTENTS, StringReader(text)) 
    print doc.get(FIELD_PATH) 
    print highlighter.getBestFragments(ts, text, 3, "...") 
    print "" 

请注意,我们在搜索结果中的每一项创建令牌流。

+1

谢谢!似乎这里最重要的部分是创建'QueryScorer'而不是'Scorer' - 现在,当我在Lucene的文档中查找它时,发现'Scorer'是一个抽象类,所以这就是错误出现的原因。并且名字'NotImplementedError'在这里是相当误导的... – mik01aj 2012-09-26 18:10:21

+0

代码很好用。有一点要提,StringReader是从java.io中导入的,而不是从lucene中导入的。 – vancexu 2014-04-22 05:24:43