哈希映射快速插入但检索速度慢

我正在使用谷歌的哈希映射的实现 google :: dense_hash_map。哈希映射快速插入但检索速度慢

矿是一个集群应用程序。所以我必须在成对的集群之间存储距离。每个群集都有一个长整型的群集ID。所以密钥必须是（long int id1，long int id2）;

所以我决定我需要一个哈希映射里面的哈希映射为此工作。

这是我距离存储散列映射结构：

google::dense_hash_map<long int, google::dense_hash_map<long int, double> > distanceHash;

这是插入一段距离的哈希地图，检索

template<class Point> 
void CoverTree<Point>:: insertDistance(long int id1, long int id2, long double distance) 
{ 

    //Always id1 < id2; 
    if(id1 < id2) 
    { 
    long temp = id1; 
    id1 = id2; 
    id2 = temp; 
    } 


    if(distanceHash.find(id1) == distanceHash.end()) 
    { 
    google::dense_hash_map<long int, double> insideHash; 
    insideHash.set_empty_key(-9999 ); 
    insideHash[id2] = distance; 
    distanceHash[id1] = insideHash; 
    } 
    else 
    { 
    (distanceHash[id1])[id2] = (distanceHash[id1])[id2]; 
    } 
} 

template<class Point> 
double CoverTree<Point>::getStoredDistance(long int id1, long int id2) 
{ 
    if(id1 < id2) 
    { 
    long temp = id1; 
    id1 = id2; 
    id2 = temp; 
    } 

    google::dense_hash_map<long int, double>::iterator it; 

    if(distanceHash.find(id1) != distanceHash.end()) 
    { 

    if(distanceHash[id1].find(id2) != distanceHash[id1].end()) 
     return distanceHash[id1][id2]; 
    } 

    return -1; 
}

我有数以百万计的距离的代码。我检查了LasTime，大约有6亿个距离，其中4亿个是独特的。这意味着1/3的距离会重复，并且可以节省时间。

但是，当我使用这个哈希映射结构来存储距离时，程序运行速度会变慢。这正是我发现的：如果我只是使用距离函数存储距离，那么整个程序运行速度大约慢50秒。（200秒存储和150没有）。但是，如果我存储距离，然后在计算它们之前使用散列图检查距离是否存在，则程序变得更慢（程序的1/25需要300秒）。

我不理解这种行为。我猜想一旦距离存储完毕，检索距离应该更快。请让我知道这里出了什么问题，如果可以做得更快。

P.S：RAM不是问题。我正在服务器上运行大约160个演出的RAM。而使用hashmap时的峰值内存消耗仅占内存总量的1.8％（见上图）。所以分页和颠簸应该不成问题。

来源

2012-09-02 The Flying Dutchman

Is getStoredDistance（long int id1，long int id2）slow？ –

您正在使用distanceHash.find（id1）N次？它的复杂性是什么？ N * N？然后你把另一个N，它变成O（N * N * N） –

是getStoredDistance很慢。我看到情况如何。我有一个想法可以解决这个问题。我将在每个点中都有一个散列表。该散列表存储距该特定节点的所有节点的距离。这将消除istanceHash.find（id1），因为我知道我需要距离的节点。 –

But If I store the distances and then use the hashmap to check whether the distances exist before computing them, the program becomes way way slower(1/25th of the program takes 300 seconds).

我怀疑你正在寻找所有元素以批准数据。

好吧，HashMap的查找时间复杂度为O（n），但你在getStoredDistance功能N次，这使得总的复杂度为O（N * N）使用

distanceHash.find(id1)

两次为最坏的情况

400M * 400M = 160000000000000000太复杂

来源

2012-09-02 09:16:15

哈希映射快速插入但检索速度慢

回答

相关问题