在C++中，告诉两个字符串或二进制文件是否不同的最快方法是什么？

我正在编写单元测试，需要将结果文件与金色文件进行比较。最简单的方法是什么？在C++中，告诉两个字符串或二进制文件是否不同的最快方法是什么？

到目前为止，我有（用于Linux环境）：

int result = system("diff file1 file2");

他们是不同的，如果result != 0

来源

2013-02-27 Victor Lyuboslavsky

这听起来像是比较两个文件的合理方式，是的。 – 2013-02-27 17:42:45

有'diff'的各种标准选项来抑制输出。使用它们，如果你通过'system'调用它。 – pmr 2013-02-27 17:45:19

您可以使用'cmp'而不是'diff'。 – 2013-02-27 17:50:17

如果你想要一个纯C++的解决方案，我会做这样的事情

#include <algorithm> 
#include <iterator> 
#include <string> 
#include <fstream> 

template<typename InputIterator1, typename InputIterator2> 
bool 
range_equal(InputIterator1 first1, InputIterator1 last1, 
     InputIterator2 first2, InputIterator2 last2) 
{ 
    while(first1 != last1 && first2 != last2) 
    { 
     if(*first1 != *first2) return false; 
     ++first1; 
     ++first2; 
    } 
    return (first1 == last1) && (first2 == last2); 
} 

bool compare_files(const std::string& filename1, const std::string& filename2) 
{ 
    std::ifstream file1(filename1); 
    std::ifstream file2(filename2); 

    std::istreambuf_iterator<char> begin1(file1); 
    std::istreambuf_iterator<char> begin2(file2); 

    std::istreambuf_iterator<char> end; 

    return range_equal(begin1, end, begin2, end); 
}

它避免了整个文件读入内存，并尽快的文件是不同的停止（或在文件的结尾）。 range_equal是因为std::equal没有为第二个范围带一对迭代器，并且如果第二个范围较短，则不安全。

来源

2013-02-27 18:18:46

你能解释为什么'end'你使用单位迭代器吗？ OP提到二进制文件，它会使用['std :: ios :: binary']（http://stackoverflow.com/a/5420568/2436175）吗？ P.S .:我会注意到这不是最快的，因为它在当时也检查一个字节的大文件。但作为一个简单的解决方案看起来非常出色 – Antonio 2016-08-22 15:31:22

@Antonio未初始化的std :: istreambuf_iterator是结束迭代器。为了性能，代码假定你的流正在进行缓冲（例如，在'std :: ifstream'的许多实现中，底层流被缓冲了）。 – 2016-08-23 18:24:49

这应该工作：

#include <string> 
#include <fstream> 
#include <streambuf> 
#include <iterator> 


bool equal_files(const std::string& a, const std::string& b) 
    std::ifstream stream{a}; 
    std::string file1{std::istreambuf_iterator<char>(stream), 
        std::istreambuf_iterator<char>()}; 

    stream = std::ifstream{b}; 
    std::string file2{std::istreambuf_iterator<char>(stream), 
        std::istreambuf_iterator<char>()}; 

    return file1 == file2; 
}

我怀疑这是不是一样快diff，但它避免了呼叫 system。不过，它应该足够用于测试用例。

来源

2013-02-27 17:48:15 pmr

你可能想要包含'iterator'。 – 2013-02-27 17:49:43

阻止读取这两个文件的一种方法是预先将黄金文件计算为哈希，例如md5。那么你只需要检查测试文件。请注意，这可能比只读两个文件要慢！

或者，您可以检查层 - 查看文件大小，如果它们不同，则文件不同，并且可以避免冗长的读取和比较操作。

来源

2013-02-27 17:50:21 gbjbaanb

可能是一个矫枉过正的问题，但您可以使用boost/bimap和boost/scope_exit构建散列表SHA-256的表。

这里是一个视频如何斯蒂芬牛逼Lavavej做到这一点（始于8.15）： http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Advanced-STL/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-5-of-n

有关算法的详细信息： http://en.wikipedia.org/wiki/SHA-2

来源

2013-02-27 18:19:03 Lufi

从DaveS's answer发展，并作为第一件事情checking file size：

#include <fstream> 
#include <algorithm> 

bool compare_files(const std::string& filename1, const std::string& filename2) 
{ 
    std::ifstream file1(filename1, std::ifstream::ate | std::ifstream::binary); //open file at the end 
    std::ifstream file2(filename2, std::ifstream::ate | std::ifstream::binary); //open file at the end 
    const std::ifstream::pos_type fileSize = file1.tellg(); 

    if (fileSize != file2.tellg()) { 
     return false; //different file size 
    } 

    file1.seekg(0); //rewind 
    file2.seekg(0); //rewind 

    std::istreambuf_iterator<char> begin1(file1); 
    std::istreambuf_iterator<char> begin2(file2); 

    return std::equal(begin1,std::istreambuf_iterator<char>(),begin2); //Second argument is end-of-range iterator 
}

（我不知道倒带前，file1可以用来创建一个更有效的stream ite结束rator，通过知道流的长度，将允许std::equal当时处理更多的字节）。

来源

2016-08-23 09:17:33 Antonio

在C++中，告诉两个字符串或二进制文件是否不同的最快方法是什么？

回答

相关问题