2012-03-30 98 views
3

索引大量数据时,Solr遇到OOM错误。我知道一般的建议是将索引拆分成碎片,但事实上情况已经如此。我正在索引分片,并且在这一点上进一步分裂不是一种选择。我想了解正在发生的事情,为什么会出现此错误,以及是否有任何事情可以处理,而不是分割或提供更多内存。SOLR内存不足错误读取索引大型索引时

如果在这种情况下RAM消耗是线性的(或更糟糕的),我会感到难过,我宁愿让它呈现亚线性。

这种情况是我用随机字符串索引文件(因此字典非常大)。每份文件都有一对20-30个字符的字段和一个字段大约200-500个字符。每个分片中的索引大小约为250-260GB,处理该索引的每个solr实例都有大约4GB的内存。当OOM发生时,在重新启动之后,Solr HeapDump看起来差不多,所以它可能与索引无关,但与Solr搜索器相关。就在OOM前堆转储的最大对象如下所示:

<tree type="Heap walker - Biggest objects"> 
    <object leaf="false" class="org.apache.solr.core.SolrCore" objectId="0xf02c" type="instance" retainedBytes="120456864" retainedPercent="97.4"> 
    <outgoing leaf="false" class="org.apache.solr.search.SolrIndexSearcher" objectId="0xfb52" type="instance" retainedBytes="120383232" retainedPercent="97.3" referenceType="not specified" referenceName="[transitive reference]"> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018e" type="instance" retainedBytes="8161688" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10185" type="instance" retainedBytes="8148072" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10188" type="instance" retainedBytes="8138232" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10186" type="instance" retainedBytes="8129160" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10191" type="instance" retainedBytes="8124608" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018a" type="instance" retainedBytes="8123144" retainedPercent="6.6" referenceType="not specified" referenceName="[transitive reference]"/> 

     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10192" type="instance" retainedBytes="8100904" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10190" type="instance" retainedBytes="8097984" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018b" type="instance" retainedBytes="8096160" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018d" type="instance" retainedBytes="8081656" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10187" type="instance" retainedBytes="8042504" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018c" type="instance" retainedBytes="8039336" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10189" type="instance" retainedBytes="8036952" retainedPercent="6.5" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1018f" type="instance" retainedBytes="7948568" retainedPercent="6.4" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10195" type="instance" retainedBytes="832448" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 

     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10196" type="instance" retainedBytes="830584" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10194" type="instance" retainedBytes="829232" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10197" type="instance" retainedBytes="828808" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10198" type="instance" retainedBytes="827312" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10199" type="instance" retainedBytes="824736" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x1019a" type="instance" retainedBytes="822608" retainedPercent="0.7" referenceType="not specified" referenceName="[transitive reference]"/> 
     <outgoing leaf="false" class="org.apache.lucene.index.ReadOnlySegmentReader" objectId="0x10193" type="instance" retainedBytes="783424" retainedPercent="0.6" referenceType="not specified" referenceName="[transitive reference]"/> 
     <cutoff objectCount="96" totalSizeBytes="534976" maximumSingleSizeBytes="87560"/> 
    </outgoing> 

    <cutoff objectCount="53" totalSizeBytes="73496" maximumSingleSizeBytes="40992"/> 
    </object> 
    <object leaf="false" class="org.mortbay.jetty.webapp.WebAppClassLoader" objectId="0xdf88" type="instance" retainedBytes="420208" retainedPercent="0.3"/> 
    <object leaf="false" class="org.apache.solr.core.SolrConfig" objectId="0xe5f5" type="instance" retainedBytes="184976" retainedPercent="0.1"/> 
..... 

的JMAP简单转储看起来是这样的:

Attaching to process ID 27000, please wait... 
Debugger attached successfully. 
Server compiler detected. 
JVM version is 20.5-b03 

using thread-local object allocation. 
Parallel GC with 2 thread(s) 

Heap Configuration: 
    MinHeapFreeRatio = 40 
    MaxHeapFreeRatio = 70 
    MaxHeapSize  = 268435456 (256.0MB) 
    NewSize   = 1310720 (1.25MB) 
    MaxNewSize  = 17592186044415 MB 
    OldSize   = 5439488 (5.1875MB) 
    NewRatio   = 2 
    SurvivorRatio = 8 
    PermSize   = 21757952 (20.75MB) 
    MaxPermSize  = 85983232 (82.0MB) 

Heap Usage: 
PS Young Generation 
Eden Space: 
    capacity = 31719424 (30.25MB) 
    used  = 17420488 (16.61347198486328MB) 
    free  = 14298936 (13.636528015136719MB) 
    54.92056854500258% used 
From Space: 
    capacity = 26673152 (25.4375MB) 
    used  = 10550856 (10.062080383300781MB) 
    free  = 16122296 (15.375419616699219MB) 
    39.55608995892199% used 
To Space: 
    capacity = 27000832 (25.75MB) 
    used  = 0 (0.0MB) 
    free  = 27000832 (25.75MB) 
    0.0% used 
PS Old Generation 
    capacity = 178978816 (170.6875MB) 
    used  = 168585552 (160.7757110595703MB) 
    free  = 10393264 (9.911788940429688MB) 
    94.19302002757689% used 
PS Perm Generation 
    capacity = 42008576 (40.0625MB) 
    used  = 41690016 (39.758697509765625MB) 
    free  = 318560 (0.303802490234375MB) 
    99.24167865152106% used 

我在这里看不到任何东西,都会给我任何线索,如何处理它,除了提供更多的内存,在一般情况下这不是一个解决方案,我想知道是怎么回事,为什么Searcher和它的ReadOnlySegmentReaders占用所有内存,他们真的需要,我可以做些什么吗?

更新: 我已经做了一个大约150万字(而不是随机单词)的小词典测试,我达到了大约350GB的索引大小,并没有OOME,所以这不是直接连接到索引大小,可能必须做更多的术语矢量大小(独特的条款)。但是我仍然想了解我的局限性,以及如何绕过它们。

+0

您正在使用的操作系统是什么?它是64位还是32位? – Yavar 2012-03-30 05:42:57

+0

Linux RH,64bit – ilfrin 2012-03-30 13:17:44

+0

更新: 我已经做了一个大约150万字(而不是随机单词)的小词典的测试,我达到了大约350GB的索引大小,并且没有OOME,所以这不是直接的连接到索引大小,可能必须做更多的术语向量大小(独特的条款)。但是我仍然想了解我的局限性,以及如何绕过它们。 – ilfrin 2012-03-30 13:20:28

回答

0

它取决于您获取您的服务器场的每个“分片”上索引的所有文档。对于分布式索引没有现成的支持,但是您的方法可以像循环技术一样简单:将每个文档索引到圆圈中的下一个服务器。一个简单的哈希系统也可以工作,Solr Wiki建议uniqueId.hashCode()%numServers作为一个适当的哈希函数。

请记住,Solr不计算通用术语/文档频率。在大规模情况下,tf/idf在碎片级别计算不太可能是重要的 - 但是,如果您的集合在服务器上的分布严重偏离,则可能会对相关性结果产生影响。它可能是最好的随机分发文件到你的碎片。 请注意>>>>>>>尝试使用散列码代替随机字符串来索引文档

+0

我想你不明白我的问题。我正在索引碎片。它使用Hadoop并行完成。而鉴于我提到在大约每碎片索引的大小260GB索引这个数据在这些碎片崩溃的特定文本语料库,我现在已经是这个OOM不直接连接到索引的大小,因为我收录了一些其它数据(不随机字符串),这给了我一个360GB的指数,碎片幸存下来......无论如何,我猜你正在回答一个不同的问题,感谢无论如何感兴趣;) – ilfrin 2012-03-30 17:57:22