当独立处理行时，如何从输入文件并行读取行？

我刚刚开始使用C++的OpenMP。我在C++的串行代码看起来是这样的：当独立处理行时，如何从输入文件并行读取行？

#include <iostream> 
#include <string> 
#include <sstream> 
#include <vector> 
#include <fstream> 
#include <stdlib.h> 

int main(int argc, char* argv[]) { 
    string line; 
    std::ifstream inputfile(argv[1]); 

    if(inputfile.is_open()) { 
     while(getline(inputfile, line)) { 
      // Line gets processed and written into an output file 
     } 
    } 
}

因为每一行非常独立地处理，我尝试使用OpenMP的并行，是因为输入文件是在千兆字节的顺序。所以我猜想，首先我需要获取输入文件中的行数，然后通过这种方式并行化代码。有人可以帮我在这里吗？

#include <iostream> 
#include <string> 
#include <sstream> 
#include <vector> 
#include <fstream> 
#include <stdlib.h> 

#ifdef _OPENMP 
#include <omp.h> 
#endif 

int main(int argc, char* argv[]) { 
    string line; 
    std::ifstream inputfile(argv[1]); 

    if(inputfile.is_open()) { 
     //Calculate number of lines in file? 
     //Set an output filename and open an ofstream 
     #pragma omp parallel num_threads(8) 
     { 
      #pragma omp for schedule(dynamic, 1000) 
      for(int i = 0; i < lines_in_file; i++) { 
       //What do I do here? I cannot just read any line because it requires random access 
      } 
     } 
    } 
}

编辑：

重要的事情

每一行独立处理
秩序的结果并不重要

来源

2010-10-05 Legend

你说每条线都是独立的，但是结果的顺序呢？ – aneccodeal 2010-10-05 01:37:49

@aneccodeal：这也是独立的，因为我最终会将这些数据插入到数据库中。 – Legend 2010-10-05 01:38:20

假设所有行的长度（大致）是相同的，则不需要计算行数（这很昂贵;您必须读取整个文件！）您可以计算文件的大小（寻找到最后并查看指针所在的位置），按字节数将它分成八个块，然后向前查找每个块指针（除了最初的那个），直到它到达一个新行。 – 2010-10-05 01:38:28

不是直接OpenMP的答案 - 但你可能要找的是方法。看看Hadoop--它是用Java完成的，但至少有一些C++ API。

一般而言，您希望在不同机器上处理这些数据量，而不是在同一进程中的多个线程中（虚拟地址空间限制，缺少物理内存，交换等）。另外，内核必须带磁盘文件按顺序依次存在（你想要的 - 否则硬盘驱动器将不得不为每个线程做额外的搜索）。

来源

2010-10-05 01:39:55

感谢您的解释。你所说的话现在变得非常有意义。 – Legend 2010-10-05 01:52:12

当独立处理行时，如何从输入文件并行读取行？

回答

相关问题