如何添加字符串到杜鹃过滤器？

我正在运行在杜鹃过滤器库中提供的示例的修改版本：https://github.com/efficient/cuckoofilter/blob/master/example/test.cc 如何添加字符串到杜鹃过滤器？

我想向杜鹃过滤器添加字符串。虽然字符串被添加，但是当我检查它是否存在于过滤器中时，它总是返回false。任何人都可以指出我的方法有什么问题吗？

size_t total_items = 1000000; 
CuckooFilter<string, 12> filter(total_items); 

// Insert items to this cuckoo filter 
string temp1 = "sample"; 
if (filter.Add(temp1) != cuckoofilter::Ok) { 
     cout<<"not added"<<endl; 
}  

// Check if previously inserted items are in the filter 
string temp2 = "sample"; 
assert(filter.Contain(temp2) == cuckoofilter::Ok);

断言应该是真的，但在这种情况下它是错误的。为什么？

来源

2016-11-10 cout_display_name

在https://github.com/efficient/cuckoofilter/blob/master/src/cuckoofilter.h#L65源快速浏览发现它使用一个函数

inline void GenerateIndexTagHash(const ItemType &item, size_t* index, uint32_t* tag) const 
{ 
    std::string hashed_key = HashUtil::SHA1Hash(
     (const char*) &item, 
     sizeof(item) 
    ); 

// ... rest is skipped for brevity 

}

以生成所述项目的初始索引和指纹（标签）。问题是它通过散列实际对象。为了简化，它这样做：

// Your filter.Add(temp1) inside does this 
HashUtil::SHA1Hash((const char*) &temp1, sizeof(temp1)); 

// Your filter.Contain(temp2) inside does this 
HashUtil::SHA1Hash((const char*) &temp2, sizeof(temp2));

基本上，它散列两个完全不同的对象，正如预期的，产生不同的散列和映射到不同的桶。

对于它在你的情况下工作，它需要调用HashUtil :: SHA1Hash（）在散列实际字符串数据的方法，那就是：

// It should do something like this 
HashUtil::SHA1Hash(
    temp1.c_str(), // <-- notice we pass a pointer to an actual character data rather than a pointer to an instance of a std::string() 
    temp1.length() 
); 


HashUtil::SHA1Hash(
    temp2.c_str(), // <-- notice we pass a pointer to an actual character data rather than a pointer to an instance of a std::string() 
    temp2.length() 
);

这应该回答你为什么？的问题。至于

任何人都可以指出我的方法有什么问题吗？

你的方法本身并没有什么不对，它只是不像你想象的那样工作，因为库不支持这样的用例。

来源

2016-11-29 19:01:37

感谢您的回答。我修改源使用SuperFastHash而不是SHA1/MD5，它的工作。 –

如何添加字符串到杜鹃过滤器？

回答

相关问题