2013-04-25 70 views
0

我想从lucene索引搜索,但我想筛选此搜索。有两个字段的内容和类别。假设我想搜索具有“体育”的文件,我也想统计在a和b类别中有多少文件。我正试图用以下代码实现这一点。但问题是,如果有数百万条记录,然后由于循环执行而变得缓慢,建议我以另一种方式来完成任务。从apache lucene索引搜索并计算结果组明智

尝试{文件indexDir =新的文件(“文件路径”)

  Directory directory = FSDirectory.open(indexDir); 

       IndexSearcher searcher = new IndexSearcher(directory, true); 
       int maxhits=1000000; 
       QueryParser parser1 = new QueryParser(Version.LUCENE_36, "contents", 

        new StandardAnalyzer(Version.LUCENE_36)); 

      Query qu=parser1.parse("sport"); 

       TopDocs topDocs = searcher.search(, maxhits); 
       ScoreDoc[] hits = topDocs.scoreDocs; 


      len = hits.length; 

     JOptionPane.showMessageDialog(null,"found times"+len); 

       int docId = 0; 
       Document d; 





String category=""; 

int ctr=0,ctr1=0; 

for (i = 0; i<len; i++) { 
docId = hits[i].doc; 
d = searcher.doc(docId); 
category= d.get(("category")); 
if(category.equals("a")) 
ctr++; 
if(category.equals("b")) 
ctr1++; 


} 

    JOptionPane.showMessageDialog("wprd found in category a times"+ctr); 
    JOptionPane.showMessageDialog("wprd found in category b times"+ctr1); 
    } 

catch(Exception ex) 

{ 

    ex.printStackTrace(); 
} 

回答

1

你可以只查询你正在寻找每一个类别,并得到totalHits。更好的办法是使用TotalHitCountCollector,而不是获得TopDocs实例:

Query query = parser1.parser("+sport +category:a") 
TotalHitCountCollector collector = new TotalHitCountCollector(); 
search.search(query, collector); 
ctr = collector.getTotalHits(); 
query = parser1.parser("+sport +category:b") 
collector = new TotalHitCountCollector(); 
search.search(query, collector); 
ctr1 = collector.getTotalHits();