Java的并行文件处理

我有以下代码：Java的并行文件处理

import java.io.*; 
import java.util.concurrent.* ; 
public class Example{ 
public static void main(String args[]) { 
    try { 
     FileOutputStream fos = new FileOutputStream("1.dat"); 
     DataOutputStream dos = new DataOutputStream(fos); 

     for (int i = 0; i < 200000; i++) { 
      dos.writeInt(i); 
     } 
     dos.close();               // Two sample files created 

     FileOutputStream fos1 = new FileOutputStream("2.dat"); 
     DataOutputStream dos1 = new DataOutputStream(fos1); 

     for (int i = 200000; i < 400000; i++) { 
      dos1.writeInt(i); 
     } 
     dos1.close(); 

     Exampless.createArray(200000); //Create a shared array 
     Exampless ex1 = new Exampless("1.dat"); 
     Exampless ex2 = new Exampless("2.dat"); 
     ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file 
     long startTime = System.nanoTime(); 
     long endTime; 
     Future<Integer> future1 = executor.submit(ex1); 
     Future<Integer> future2 = executor.submit(ex2); 
     int count1 = future1.get(); 
     int count2 = future2.get(); 
     endTime = System.nanoTime(); 
     long duration = endTime - startTime; 
     System.out.println("duration with threads:"+duration); 
     executor.shutdown(); 
     System.out.println("Matches: " + (count1 + count2)); 

     startTime = System.nanoTime(); 
     ex1.call(); 
     ex2.call(); 
     endTime = System.nanoTime(); 
     duration = endTime - startTime; 
     System.out.println("duration without threads:"+duration); 

    } catch (Exception e) { 
     System.err.println("Error: " + e.getMessage()); 
    } 
} 
} 

class Exampless implements Callable { 

public static int[] arr = new int[20000]; 
public String _name; 

public Exampless(String name) { 
    this._name = name; 
} 

static void createArray(int z) { 
    for (int i = z; i < z + 20000; i++) { //shared array 
     arr[i - z] = i; 
    } 
} 

public Object call() { 
    try { 
     int cnt = 0; 
     FileInputStream fin = new FileInputStream(_name); 
     DataInputStream din = new DataInputStream(fin);  // read file and calculate number of matches 
     for (int i = 0; i < 20000; i++) { 
      int c = din.readInt(); 
      if (c == arr[i]) { 
       cnt++; 
      } 
     } 
     return cnt ; 
    } catch (Exception e) { 
     System.err.println("Error: " + e.getMessage()); 
    } 
    return -1 ; 
} 

}

当我试图用两个文件来计算阵列中的匹配数量。现在，虽然我在两个线程上运行它，但代码并不完善，因为：

（在单线程上运行它，文件1 +文件2读取时间）<（文件1 ||文件2在多线程中读取时间）。

任何人都可以帮助我如何解决这个问题（我有2核心CPU和文件大小约为1.5 GB）。

来源

2012-07-31 Arpssss

@SurajChandran，大部分时间。真正没有效果。:)只是运行测试。 – Arpssss 2012-07-31 16:33:24

文件不是1.5GB，只有~80K。 – 2012-07-31 16:33:42

@KeithRandall，我只是举例说明。 – Arpssss 2012-07-31 16:36:29

在第一种情况下，您按顺序逐个读取一个文件，逐字节读取。这与磁盘I/O的速度一样快，只要文件不是很分散。当你完成第一个文件时，磁盘/操作系统找到第二个文件的开始，并继续非常高效地读取磁盘。

在第二种情况下，您经常在第一个和第二个文件之间切换，迫使磁盘从一个地方到另一个地方。这额外的寻找时间（约10毫秒）是你的困惑的根源。

哦，你知道磁盘访问是单线程的，你的任务是I/O绑定的，所以没有办法将这个任务分割到多个线程可以提供帮助，只要你从同一个物理磁盘读取数据？你的方法只能是合理的，如果：

每个线程，除了从文件中读取，也被执行一些CPU密集型或相对于I/O通过一个数量级阻塞操作，速度较慢。
文件在不同物理驱动器（不同分区是不够的），或者在某些RAID配置
您使用的SSD驱动器

来源

2012-07-31 16:32:59

+1。这是许多人不了解的一个基本问题：只有增加限制试剂才能提高性能。 – RedGreasel 2012-07-31 16:53:50

你不会得到多线程任何好处正如Tomasz从阅读磁盘数据中指出的那样。如果您多线程化检查，即可以将文件中的数据顺序加载到数组中，然后线程并行执行检查，则可能会提高速度。但考虑到你的文件的小尺寸（〜80kb）以及你只是比较整数的事实，我怀疑性能的提高是值得的。

如果你不使用readInt（），那么肯定会提高执行速度的东西。既然你知道你在比较20000个整数，你应该为每个文件（或者至少在块中）读取所有20000个整型数组，而不是调用readInt（）函数20000次。

来源

2012-07-31 16:54:20 onit

Java的并行文件处理

回答

相关问题