lucene在while循环中创建文档的速度越来越慢

我有一些效率问题。我正在开发一个作为EAR归档部署在jboss EAP 6.1服务器上的企业应用程序。我在while循环中基于实体创建新对象并将它们写入文件。我以有限的量获得这些实体（在EJB DAO的帮助下）（例如，每个步骤2000）。问题是我需要处理数以百万计的对象，前一百万行很顺利，但进一步的循环越慢越好。谁能告诉我为什么这个工作越来越慢，随着循环的进展？我怎样才能让它工作顺利？这里是代码的一些关键部分：lucene在while循环中创建文档的速度越来越慢

public void createFullIndex(int stepSize) { 
     int logsNumber = systemLogDao.getSystemLogsNumber(); 
     int counter = 0; 
     while (counter < logsNumber) { 
      for (SystemLogEntity systemLogEntity : systemLogDao.getLimitedSystemLogs(counter, stepSize)) { 
       addDocument(systemLogEntity); 
      } 
      counter = counter + stepSize; 
     } 
     commitIndex(); 
    } 

    public void addDocument(SystemLogEntity systemLogEntity) { 
     try { 
     Document document = new Document(); 
     document.add(new NumericField("id", Field.Store.YES, true).setIntValue(systemLogEntity.getId())); 
     document.add(new Field("resource", (systemLogEntity.getResource() == null ? "" : systemLogEntity 
       .getResource().getResourceCode()), Field.Store.YES, Field.Index.ANALYZED)); 
     document.add(new Field("operationType", (systemLogEntity.getOperationType() == null ? "" : systemLogEntity 
     document.add(new Field("comment", 
       (systemLogEntity.getComment() == null ? "" : systemLogEntity.getComment()), Field.Store.YES, 
       Field.Index.ANALYZED)); 
     indexWriter.addDocument(document); 
     } catch (CorruptIndexException e) { 
      LOGGER.error("Failed to add the following log to Lucene index:\n" + systemLogEntity.toString(), e); 
     } catch (IOException e) { 
      LOGGER.error("Failed to add the following log to Lucene index:\n" + systemLogEntity.toString(), e); 
     } 
    }

我希望你的帮助！

来源

2014-09-02 AjMeen

你看过你的堆统计数据吗？ – 2014-09-02 12:18:31

@HotLicks我想过，但说实话，我不太清楚该怎么做。 – AjMeen 2014-09-02 12:22:27

什么是'indexWriter'？看来你正在将所有的文档都添加到它，并且它会保留对它们的引用，并将它们保存在内存中。 – 2014-09-02 12:41:05

据我所见，你不要把你的东西写入文件，只要你得到它。而是尝试创建完整的DOM对象，然后将其刷新到文件。这种策略适用于数量有限的对象。在你的情况下，你必须处理数以百万计（如你所说），你不应该使用DOM。相反，您应该能够在接收数据时创建XML片段并将它们写入文件。这将减少您的内存消耗并希望提高性能。

来源

2014-09-02 12:24:51 AlexR

我认为这是一个影响最大的建议。谢谢！ – AjMeen 2014-09-02 14:24:54

不客气。祝你好运。 – AlexR 2014-09-02 15:03:24

伐木应该很容易。使用番石榴追加到文本的样子：

File to = new File("C:/Logs/log.txt"); 
CharSequence from = "Your data as string\n"; 
Files.append(from, to, Charsets.UTF_8);

我有几个注意事项：

我不知道，如果你的日志实体垃圾收集
目前尚不清楚该文件的内容保持在内存
如果日志是XML格式的，那么整个XML DOM可能需要进行解析，如果新元素添加

来源

2014-09-02 12:34:39 Margus

我会尝试重新使用Document对象。我的循环问题与垃圾收集有关，我的循环太快，gc不能合理跟上，重新使用对象解决了我所有的问题。我还没有尝试过亲自使用Document对象，但是如果可能的话，它可能适用于您。

来源

2014-09-02 12:35:20 Kieveli

谢谢，这是一个合理的提示！ +1 – AjMeen 2014-09-02 14:26:26

lucene在while循环中创建文档的速度越来越慢

回答

相关问题