2016-06-08 65 views
0

我试图将兴趣概况翻译成一些Lucene查询。分层评分Lucene,OR术语治疗

给定一个标题项和一些扩展术语,JSON格式,如

{"title":"Donald Trump", "Expansion":[["republic","republican"],["democratic","democrat"],["campaign"]]}

相应Lucene的查询可以像以下的(集标题术语升压因子为3.0 BooleanQuery而膨胀术语升压因子为1.0)。

+(text:donald^3.0 text:trump^3.0 (text:democrat text:democratic) (text:republic text:republican) text:campaign)

使用IndexSearcher's explain()方法,

的匹配文件一样,

I know people just want to find a way to be famous without taking any risks, republic republican Donald Trump Campaign.

有9.0

3.0 = weight(text:donald^3.0 in 0) [TitleExpansionSimilarity], result of: 
    3.0 = score(doc=0,freq=1.0), product of: 
     3.0 = queryWeight, product of: 
     3.0 = boost 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = queryNorm 
     1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
      1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 
    3.0 = weight(text:trump^3.0 in 0) [TitleExpansionSimilarity], result of: 
    3.0 = score(doc=0,freq=1.0), product of: 
     3.0 = queryWeight, product of: 
     3.0 = boost 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = queryNorm 
     1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
      1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 
    2.0 = sum of: 
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of: 
     1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
      1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of: 
     1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
      1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 
    1.0 = weight(text:campaign in 0) [TitleExpansionSimilarity], result of: 
    1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
     1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 

得分有什么办法改写的Lucene评分函数,为布尔查询(文本:共和国文本:共和党)aka评分。集群["republic","republican"]作为“共和国”的匹配权重还是“共和党”的匹配权重的最大值?

1.0 = MAX(instead of sum) of: 
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of: 
     1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
      1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of: 
     1.0 = fieldWeight in 0, product of: 
     1.0 = tf(freq=1.0), with freq of: 
      1.0 = termFreq=1.0 
     1.0 = idf(docFreq=201, maxDocs=201) 
     1.0 = fieldNorm(doc=0) 

回答

0

不通过Lucene的QueryParser的语法,但你可以使用一个DisjunctionMaxQuery代替BooleanQuery,合并查询和成绩与它的最大比分的子查询,而不是子查询的分数的总和。

+0

感谢您指出这个femtoRgon! –