我想阅读我的索引与Lucene的单个术语

我想阅读每一个索引。我想阅读并打印以控制索引中的单个词条。（我不想看到卢克的内容）。我必须使用类IndexReader？我想阅读我的索引与Lucene的单个术语

有人可以帮助我吗？

我试图做的：

iReader = IndexReader.open(directory); 

    int num = iReader.numDocs(); 
    for (int i = 0; i < num; i++) 
    { 
     if (! iReader.isDeleted(i)) 
     { 
      org.apache.lucene.document.Document d = iReader.document(i); 
      System.out.println("d=" +d.getField("title").tokenStreamValue()); 

     } 
    } 


    org.apache.lucene.document.Document doc = new org.apache.lucene.document.Document(); 

    //aggiungo tutti i documenti 


    Field title = new Field(
       "title", 
       testDoc.title, 
       Field.Store.YES, 
       Field.Index.ANALYZED, 
       Field.TermVector.WITH_POSITIONS_OFFSETS); 

    doc.add(title); 


    Field content = new Field(
       "content", 
       testDoc.content, 
       Field.Store.YES, 
       Field.Index.ANALYZED, 
       Field.TermVector.WITH_POSITIONS_OFFSETS); 
    doc.add(content); 


    iWriter.addDocument(doc);

但d = null; 哪儿我去错了吗？我想检索术语到我索引的字段标题...

非常感谢。

来源

2011-01-14 JackDaniels

同样，我使用Java，但原理将是相同的。

你想要做什么类似于枚举术语频率，但你只关心不同的领域。

这example和这example关于如何计算Lucene索引中的术语频率应该让你去。

来源

2011-01-14 15:15:47 Joel

要检查索引，请使用IndexReader。该类有一个方法document(int)，您可以使用该方法查找索引所包含的单个文档。然后文档为您提供为该文档创建的所有字段。

使用该字段，您可以获取其值或标记流（即最终在索引中的字符串）。

[编辑]如果您删除文件，索引将有漏洞。所以你必须添加一个支票：

org.apache.lucene.document.Document d = iReader.document(i); 
if(d == null) continue; // <<-- You need this check 

System.out.println("d=" +d.getField("title").tokenStreamValue());

来源

2011-01-14 13:42:33

我亚伦感谢您的回复。我修改上面的帖子。错误在哪里？ – JackDaniels 2011-01-14 14:05:21

我使用Lucene.Net，但我认为逻辑是相同的。

必须正好有一个StringValue（），ReaderValue（）和BinaryValue（）集。那些未使用的将返回null或抛出异常。在你的情况下，请尝试读取StringValue（）。

来源

2011-01-14 14:24:59 sisve

Simon，如果我使用StringValue（），将返回doc.title的值。我想要令牌... – JackDaniels 2011-01-14 14:32:18

只需将文档标题传递到分析器，就会得到结果标记。 – sisve 2011-01-14 14:42:49

我想阅读我的索引与Lucene的单个术语

回答

相关问题