2017-02-18 65 views
1

我按照从Solr中的文档中的拼写检查的例子。如何Solr的整理工作

我已经使用了CONFIGS:

<!-- a spellchecker built from a field of the main index --> 
<lst name="spellchecker"> 
    <str name="name">default</str> 
    <str name="field">name_spell</str> 
    <str name="classname">solr.DirectSolrSpellChecker</str> 
    <!-- the spellcheck distance measure used, the default is the internal levenshtein --> 
    <str name="distanceMeasure">internal</str> 
    <!-- minimum accuracy needed to be considered a valid spellcheck suggestion --> 
    <float name="accuracy">0.5</float> 
    <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 --> 
    <int name="maxEdits">2</int> 
    <!-- the minimum shared prefix when enumerating terms --> 
    <int name="minPrefix">1</int> 
    <!-- maximum number of inspections per result. --> 
    <int name="maxInspections">5</int> 
    <!-- minimum length of a query term to be considered for correction --> 
    <int name="minQueryLength">4</int> 
    <!-- maximum threshold of documents a query term can appear to be considered for correction --> 
    <float name="maxQueryFrequency">0.01</float> 
    <!-- uncomment this to require suggestions to occur in 1% of the documents --> 
    <!-- <float name="thresholdTokenFrequency">.01</float> --> 

</lst> 
<lst name="spellchecker"> 
    <str name="name">wordbreak</str> 
    <str name="classname">solr.WordBreakSolrSpellChecker</str>  
    <str name="field">name_spell</str> 
    <str name="combineWords">true</str> 
    <str name="breakWords">true</str> 
    <int name="maxChanges">10</int> 
</lst> 
</searchComponent> 

处理程序:

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> 
    <lst name="defaults"> 
     <str name="spellcheck.dictionary">default</str> 
     <str name="spellcheck.dictionary">wordbreak</str> 
     <str name="spellcheck">on</str> 
     <str name="spellcheck.extendedResults">true</str>  
     <str name="spellcheck.count">10</str> 
     <str name="spellcheck.alternativeTermCount">5</str> 
     <str name="spellcheck.maxResultsForSuggest">5</str>  
     <str name="spellcheck.collate">true</str> 
     <str name="spellcheck.collateExtendedResults">true</str> 
     <str name="spellcheck.maxCollationTries">10</str> 
     <str name="spellcheck.maxCollations">5</str>   
    </lst> 
    <arr name="last-components"> 
     <str>spellcheck_new</str> 
    </arr> 
    </requestHandler> 

架构字段:

<field name="attribute_key" type="text" indexed="true" stored="true" multiValued="false" /> 
    <field name="spell_check_field" type="text_spell" indexed="true" stored="false" multiValued="true"/> 
    <copyField source="attribute_key" dest="spell_check_field" /> 
    <field name="name_spell" type="text_general" indexed="true" stored="false" multiValued="false"/> 
    <copyField source="attribute_key" dest="name_spell" /> 
    <field name="attribute_key_tag" type="tag" stored="false" omitTermFreqAndPositions="true" omitNorms="true" multiValued="true"/> 
    <copyField source="attribute_key" dest="attribute_key_tag" multiValued="true"/> 
    <field name="attribute_value" type="string" indexed="false" stored="true" multiValued="false" /> 
    <defaultSearchField>attribute_key</defaultSearchField> 

我看到的建议完美的工作。但是整理数组对于所有查询都是空的。

防爆查询:

http://localhost:8984/solr/spell_check/spell?spellcheck.q=nike%20shoes&spellcheck=true&spellcheck.collate=true&wt=json&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true 

结果:

{ 
"responseHeader": { 
"zkConnected": true, 
"status": 0, 
"QTime": 60 
}, 
"response": { 
"numFound": 0, 
"start": 0, 
"docs": [] 
}, 
"spellcheck": { 
"suggestions": [ 
"nike", 
{ 
"numFound": 6, 
"startOffset": 0, 
"endOffset": 4, 
"origFreq": 2, 
"suggestion": [ 
{ 
"word": "n i k e", 
"freq": 19 
}, 
{ 
"word": "nine", 
"freq": 1 
}, 
{ 
"word": "none", 
"freq": 29 
}, 
{ 
"word": "note", 
"freq": 5 
}, 
{ 
"word": "nicka", 
"freq": 2 
}, 
{ 
"word": "nino", 
"freq": 2 
} 
] 
}, 
"shoes", 
{ 
"numFound": 10, 
"startOffset": 5, 
"endOffset": 10, 
"origFreq": 0, 
"suggestion": [ 
{ 
"word": "shoe", 
"freq": 30 
}, 
{ 
"word": "shoe s", 
"freq": 30 
}, 
{ 
"word": "short", 
"freq": 30 
}, 
{ 
"word": "s h o e s", 
"freq": 4 
}, 
{ 
"word": "sheer", 
"freq": 15 
}, 
{ 
"word": "sheen", 
"freq": 4 
}, 
{ 
"word": "sheet", 
"freq": 3 
}, 
{ 
"word": "shower", 
"freq": 2 
}, 
{ 
"word": "shock", 
"freq": 1 
}, 
{ 
"word": "shred", 
"freq": 1 
} 
] 
} 
], 
"correctlySpelled": false, 
"collations": [] 
} 
} 

如何设置的排序规则吗?

+0

有你解决了这个,我也面临着同样的。排序规则总是空的,正确排除总是错误的。 – userab

回答

0

让我们先来看看文档中定义为SpellCheck Collate

Solr的原因基于在提交的查询每个 项最佳建议建立一个新的查询。

长话短说,当您指定spellcheck.collat​​e =真正发生的事情是,你问Solr的建议,你可以重新执行一个新的查询,会比你收到的建议的组合更好。让我给你看几个例子。

  • 比方说,你想搜索

初步审计

  • 而不管出于什么原因,它被输入为

initila AUD TI

  • 随着整理假,你会得到以下拼写检查建议

<lst name="suggestions"> 
     <lst name="initila"> 
      <int name="numFound">5</int> 
      <int name="startOffset">1</int> 
      <int name="endOffset">8</int> 
      <arr name="suggestion"> 
       <str>initial</str> 
       <str>initi la</str> 
       <str>initiala</str> 
       <str>ini tila</str> 
       <str>initilal</str> 
      </arr> 
     </lst> 
     <lst name="audt"> 
      <int name="numFound">4</int> 
      <int name="startOffset">9</int> 
      <int name="endOffset">13</int> 
      <arr name="suggestion"> 
       <str>aud t</str> 
       <str>audit</str> 
       <str>au dt</str> 
       <str>audi</str> 
      </arr> 
     </lst> 
    </lst> 

这意味着你将有每个字的若干建议

  • 但如果你 打开排序规则,最有可能 - 如果有的话 - 建议应执行的查询是什么。它不能保证是最好的,虽然,认为它是一个很好的猜测,可以帮助你

    <lst name="suggestions"> 
        <lst name="initila"> 
         <int name="numFound">5</int> 
         <int name="startOffset">1</int> 
         <int name="endOffset">8</int> 
         <arr name="suggestion"> 
          <str>initial</str> 
          <str>initi la</str> 
          <str>initiala</str> 
          <str>ini tila</str> 
          <str>initilal</str> 
         </arr> 
        </lst> 
        <lst name="audti"> 
         <int name="numFound">5</int> 
         <int name="startOffset">9</int> 
         <int name="endOffset">14</int> 
         <arr name="suggestion"> 
          <str>audit</str> 
          <str>audt i</str> 
          <str>auditi</str> 
          <str>au dti</str> 
          <str>audtis</str> 
         </arr> 
        </lst> 
        <lst name="collation"> 
         <str name="collationQuery">initial audit</str> 
         <int name="hits">1983</int> 
         <lst name="misspellingsAndCorrections"> 
          <str name="initila">initial</str> 
          <str name="audti">audit</str> 
         </lst> 
        </lst> 
    </lst> 
    

,这将是推荐的查询

初步审计

这是从这里获得的

<str name="collationQuery">initial audit</str> 

和归类仅如果在索引推荐的查询,将满足你在找什么

+0

您已经解释了集合是如何工作的,但您是否也可以查看问题,即'但所有查询的排序规则数组始终为空'。为什么排序规则数组总是空的。 – userab

+0

一种可能性是该词典尚未建立,但更有可能被搜索的词语尚未达到要求返回的建议所需的阈值。看看这个其他职位:https://stackoverflow.com/questions/6653186/solr-suggester-not-returning-any-results – xmorera

+0

我已经建立了字典和门槛也less.You可以通过我检查其他答案。当未指定默认字段时,Collamentation可以使用q而不使用spellcheck.q。为什么行为就是这样,不确定。 – userab

0

以下方法解决我的问题的工作:

  1. requestHandler添加默认字段为defaults孩子列表即<str name="df">name_spell</str>。现在执行您的查询将给出collations结果。这里可以使用qspellcheck.q中的任何一个。

OR

  • 使用q代替spellcheck.q和同时使用q指定字段即代替spellcheck.q=nike%20shoes使用q=name_spell:(nike%20shoes)和它将使collations结果。