如何索引Solr-5.2.1中的大内容？

我们有像超过32KB的内容，我们无法索引内容如何索引Solr-5.2.1中的大内容？

请参考下面记录

Rails日志：

RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request Error: {'responseHeader'=>{'status'=>400,'QTime'=>13},'error'=>{'msg'=>'Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.','code'=>400}}

Solr的日志：

INFO - 2015-11-04 15:00:30.772; [ collection] org.apache.solr.update.processor.LogUpdateProcessor; [collection] webapp=/solr path=/update params={wt=ruby} {} 0 27 

ERROR - 2015-11-04 15:00:30.779; [ collection] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.

。。。

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="content_textv" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[60, 112, 62, 83, 109, 97, 108, 108, 32, 97, 110, 100, 32, 77, 101, 100, 105, 117, 109, 32, 83, 99, 97, 108, 101, 32, 69, 110, 116, 101]...'

内容字段类型：

<field name="content_textv" type="strings"/>

....

<fieldType name="strings" class="solr.StrField" multiValued="true" sortMissingLast="true"/>

如何索引大内容是什么？

来源

2015-11-04 VtrKanna

你能提供的字段类型定义为这个'content_textv'和部分样本数据？ – YoungHobbit

content_textv是字符串字段@YoungHobbit – VtrKanna

而不是solr.StrField使用solr.TextField。创建一个新的字段类型一样 -

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="false"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <!-- in this example, we will only use synonyms at query time 
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
    --> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    </analyzer> 
</fieldType>

比你可以使用该字段类型为 -

<field name="content_textv" type="text_general" indexed="true" stored="false" multiValued="true"/>

来源

2015-11-05 10:19:38

谢谢@Bhagwat Mane，我们只是将字符串类型改为text_general，它为我们工作，非常感谢。 – VtrKanna

如何索引Solr-5.2.1中的大内容？

回答

相关问题