在Java中读取大文件，速度太慢，超出gc开销限制

我有一个大文件（类似3GB）并读入ArrayList中当我运行下面的代码时，几分钟后代码运行速度非常缓慢，CPU使用率高。几分钟后eclipse控制台显示错误java.lang.OutOfMemoryError：超出GC开销限制。在Java中读取大文件，速度太慢，超出gc开销限制

OS：windows2008R2，
4杯，
32GB存储
Java版本 “1.7.0_60”

的eclipse.ini

-startup 
plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar 
--launcher.library 
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20140116-2212 
-product 
org.eclipse.epp.package.standard.product 
--launcher.defaultAction 
openFile 
#--launcher.XXMaxPermSize 
#256M 
-showsplash 
org.eclipse.platform 
#--launcher.XXMaxPermSize 
#256m 
--launcher.defaultAction 
openFile 
--launcher.appendVmargs 
-vmargs 
-Dosgi.requiredJavaVersion=1.6 
-Xms10G 
-Xmx10G 
-XX:+UseParallelGC 
-XX:ParallelGCThreads=24 
-XX:MaxGCPauseMillis=1000 
-XX:+UseAdaptiveSizePolicy

Java代码：

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("/words/wordlist.dat")));   
      InputStreamReader isr = new InputStreamReader(bis,"utf-8"); 
      BufferedReader in = new BufferedReader(isr,1024*1024*512); 

      String strTemp = null; 
      long ind = 0; 

      while (((strTemp = in.readLine()) != null)) 
      { 
       matcher.reset(strTemp); 

       if(strTemp.contains("$")) 
       { 
        al.add(strTemp); 
        strTemp = null; 
       } 
       ind = ind + 1; 
       if(ind%100000==0) 
       { 
        System.out.println(ind+" 100,000 +"); 
       } 

      } 
      in.close();

我的使用情况：

neural network 
java 
oracle 
solaris 
quick sort 
apple 
green fluorescent protein 
acm 
trs

来源

2016-02-27 pangjiale

您能否详细说明您的用例？为什么在内存中需要3GB文件？ – Mahendra

这是否需要将整个文件加载到内存中？ – Devavrata

您可以通过在eclipse配置中设置'-XX：-UseGCOverheadLimi'来暂时防止此问题：[disable-the-usegcoverheadlimit-in-centos]（http://stackoverflow.com/questions/18934146/disable-the-usegcoverheadlimit- in-centos） – Mahendra

writing a program in java to get statistics on how many times the keyword were found in the search word log list

我建议你只是做到这一点。创建一个统计关键字出现次数的地图，或者统计所有关键词的出现次数。

使用Java 8流，您可以在一行或两行中执行此操作，而无需一次将整个文件加载到内存中。

try (Stream<String> s = Files.lines(Paths.get("filename"))) { 
    Map<String, Long> count = s.flatMap(line -> Stream.of(line.trim().split(" +"))) 
      .collect(Collectors.groupingBy(w -> w, Collectors.counting())); 
}

来源

2016-02-27 12:07:23

在Java中读取大文件，速度太慢，超出gc开销限制

回答

相关问题