从文件中逐块读取，然后逐行分割测试

我正在从文件读取到缓冲区，然后我将读取的文本分成字符串，其中每个文本以新行结尾形成一个新字符串。从文件中逐块读取，然后逐行分割测试

这里是我的代码：

int ysize = 20000; 
char buffer2[ysize]; 
int flag = 0; 
string temp_str; 
vector<string> temp; 
while(fread(buffer2, ysize, 1, fp2)>0){ 
    //printf("%s", buffer2); 
    std::string str(buffer2); 
    //push the data into the vect 
    std::string::size_type pos = 0; 
    std::string::size_type prev = 0; 
    /*means the last read did not read a full sentence*/ 
    if (flag == 1) { 
     if (buffer[0] == '\n') { 
      //this means we have read the last senstense correctly, directly go to the next 
     } 
     else{ 
      if((pos = str.find("\n", prev)) != std::string::npos){ 
       temp_str+=str.substr(prev, pos - prev); 
       temp.push_back(temp_str); 
       prev = pos + 1; 
      } 
      while ((pos = str.find("\n", prev)) != std::string::npos) 
      { 
       temp.push_back(str.substr(prev, pos - prev)); 
       prev = pos + 1; 
      } 

      // To get the last substring (or only, if delimiter is not found) 
      temp.push_back(str.substr(prev)); 

      if (buffer2[19999] != '\n') { 
       //we did not finish readind that query 
       flag = 1; 
       temp_str = temp.back(); 
       temp.pop_back(); 
      } 
      else{ 
       flag = 0; 
      } 


     } 
    } 
    else{ 

     while ((pos = str.find("\n", prev)) != std::string::npos) 
     { 
      temp.push_back(str.substr(prev, pos - prev)); 
      prev = pos + 1; 
     } 

     // To get the last substring (or only, if delimiter is not found) 
     temp.push_back(str.substr(prev)); 

     if (buffer2[19999] != '\n') { 
      //we did not finish readind that query 
      flag = 1; 
      temp_str = temp.back(); 
      temp.pop_back(); 
     } 
     else{ 
      flag = 0; 
     }} 
}

问题是这样的不正确读取数据时，它几乎消除了文字的一半。

我不知道我在这里错过了什么。我的想法是逐块读取数据块，然后逐行分割，这是while循环中的内容。我正在处理使用该标志的溢出案例。

来源

2017-03-20 user7631183

['while（std :: getline（myFileStream，lineStr））{...}']（http://en.cppreference.com/w/cpp/string/basic_string/getline），并相信你的' std :: ifstream'实现来做合理的缓冲。 – BoBTFish

我做到了，但表现糟透了。我试图读取数据块来提高性能，当我测试时是一个显着的差异，但分割字符串有点困难 – user7631183

我同意BoBTFish，但也许你可以尝试'std :: regex'或'std :: stringstream'。 –

首先说明，这FREAD不会奇迹般地创造一个空终止字符串，这意味着的std :: string STR（缓冲器2）会导致不确定的行为。所以，你应该做这样

int nread = 0; 
while((nread =fread(buffer2, ysize-1, 1, fp2)) > 0){ 
    buffer2[nread] = 0; 
    std::string str(buffer2); 
    ...

东西时要避免实行这里的缓冲方法，你可以使用fgets来逐行读取，那么你就只担心串联是比读缓冲区线较长。

除了[我已发现了一个问题：如果在缓冲区中的第一个字符是换行和标志== 1你跳过当前整个缓冲区读取下一个，如果仍有可用数据。（我假设用buffer [0]你实际上是指buffer2 [0]）。

来源

2017-03-20 14:33:37

谢谢！，我不是''buffer2 [nread] = 0; '，这将永远删除我的上一个阅读字符，并将其替换为0，不是吗？＆fgets不会解决我的问题，我试图一次读取多行代码 – user7631183

不，因为在C/C++中，他的索引是基于0的，因此当* nread *字符读入缓冲区时，它们将处于* buffer [0] ... buffer [nread-1] *，并且* buffer [nread] = 0 *确保空终止。至于* fgets * - 是的，我知道你想多读一行，但可能* fgets *可以为你节省一些麻烦，以便稍后拆分缓冲区，* fgets *也可以一些缓冲，所以很可能你不会失去使用它的性能。 –

从文件中逐块读取，然后逐行分割测试

回答

相关问题