对两个字典进行排序并查找排序列表中的项目索引位置的差异

-1

我面临一个挑战。让我来提出困扰我的挑战 -对两个字典进行排序并查找排序列表中的项目索引位置的差异

有两个词典 - D1和D2。这些字典大部分时间都有相同的密钥，但不能保证它总是一样的。这两个字典可以表示如下 -

D1 = {[“R1”，0.7]，[“R2”，0.73]，[“R3”，1.5]，[“R4”，2.5] “R5”，0.12]，[“R6”，1.9]，[“R7”，9.8]，[“R8”，6.5]，[“R9”，7.2] D2 = {[“R1”，0.7]，[“R2”，0.8]，[“R3”，1.5]，[“R4”，3.1]，[“R5”，0.10]，[“R6 “2.0”，[“R7”，8.0]，[“R8”，1.0]，[“R9”，0.0]，[“R10”，5.6]，[“R11”，6.23]

在这些字典中，键是字符串数据类型，值是浮点数据类型。物理上它们是两个不同时间的系统快照。 D1比D2早。我需要根据升序中的值独立排序这些字典。当完成时，这些字典改变为：这些字典改为：这些字典改为：[0124] “1.9”，[“R4”，2.5]，[“R10”，5.6]，[“R8”，6.5]，[“R9”，7.2]，[“R7”，9.8] D2 = {[“R9”，0.0]，[“R5”，0.10]，[“R1”，0.7]，[“R2”，0.8]，[“R8”，1.0] ，1.5“，[”R6“，2.0]，[”R4“，3.1]，[”R10“，5.6]，[”R11“，6.23]，[”R7“，8.0]

这里将字典D1中元素的排序作为参考点。 D1中的每个元素都与D1中紧接的下一个元素相连。期望在分类之后识别D2中已经破坏序列的元素，因为它在参考字典D1中出现。虽然确定D2的这些元素（即D1中存在但不存在于D1中但不存在于D2中的密钥）和D1中元素（即D1中存在密钥但不存在于D2中）的忽略被忽略。即他们不应该在结果中突出显示。

例如，在上面列出的示例继续，从而中断在D2的顺序，参照D1的元素（忽略加入和除去）是 -

断路器= {[“R9”，0.0]， [“R8”，1.0]}，因为R9已将序列从D1排序字典中的第8个索引跳转到D2排序字典中的第0个索引。同样，R8已将序列从D1排序字典中的第7个索引跳转到D2排序字典中的第4个索引（所有索引都从0开始）。

注 - [“R11”，6.23]预计不在断路器列表中，因为它是D2的附加。

请建议一个算法来达到这个最佳效果，因为这个操作需要对从数据库中获取的数据进行处理，包含3,256,190条记录。

编程语言不是一个担心，如果用逻辑指导，我可以承担以任何语言实现它的任务。

来源

2012-04-06 Siva Senthil

是它允许修剪D2为D1的大小（因为D2在这种情况下更大）在排序之前？ – 2012-04-06 07:04:38

不允许。你必须像现在一样使用D2。 – 2012-04-06 07:05:50

在排序之前，D1中第0个元素的键名“R1”是否总是与D2中第0个元素的键名匹配？ – 2012-04-06 07:08:54

我想出了这个算法在C＃中。它适用于您的示例数据。我还做了3000000个完全随机值的测试（因此检测到很多断路器），并在我的笔记本电脑（Intel Core i3 2.1GHz，64位）上以3.2秒完成测试。

我首先将您的数据放入临时词典中，以便在将它们放入列表之前，我可以复制粘贴您的值。当然你的应用程序会直接把它们放在列表中。

class Program 
{ 
    struct SingleValue 
    { 
     public string Key; 
     public float Value; 
     public SingleValue(string key, float value) 
     { 
      Key = key; 
      Value = value; 
     } 
     public override string ToString() 
     { 
      return string.Format("{0}={1}", Key, Value); 
     } 
    } 

    static void Main(string[] args) 
    { 
     List<SingleValue> D1 = new List<SingleValue>(); 
     HashSet<string> D1keys = new HashSet<string>(); 
     List<SingleValue> D2 = new List<SingleValue>(); 
#if !LARGETEST 
     Dictionary<string, double> D1input = new Dictionary<string, double>() { { "R1", 0.7 }, { "R2", 0.73 }, { "R3", 1.5 }, { "R4", 2.5 }, { "R5", 0.12 }, { "R6", 1.9 }, { "R7", 9.8 }, { "R8", 6.5 }, { "R9", 7.2 }, { "R10", 5.6 } }; 
     Dictionary<string, double> D2input = new Dictionary<string, double>() { { "R1", 0.7 }, { "R2", 0.8 }, { "R3", 1.5 }, { "R4", 3.1 }, { "R5", 0.10 }, { "R6", 2.0 }, { "R7", 8.0 }, { "R8", 1.0 }, { "R9", 0.0 }, { "R10", 5.6 }, { "R11", 6.23 } }; 

     // You should directly put you values into this list... I converted them from a Dictionary so I didn't have to type over your input values :) 
     foreach (KeyValuePair<string, double> kvp in D1input) 
     { 
      D1.Add(new SingleValue(kvp.Key, (float)kvp.Value)); 
      D1keys.Add(kvp.Key); 
     } 
     foreach (KeyValuePair<string, double> kvp in D2input) 
      D2.Add(new SingleValue(kvp.Key, (float)kvp.Value)); 
#else 
     Random ran = new Random(); 
     for (int i = 0; i < 3000000; i++) 
     { 
      D1.Add(new SingleValue(i.ToString(), (float)ran.NextDouble())); 
      D1keys.Add(i.ToString()); 
      D2.Add(new SingleValue(i.ToString(), (float)ran.NextDouble())); 
     } 
#endif 

     // Sort the lists 
     D1.Sort(delegate(SingleValue x, SingleValue y) 
     { 
      if (y.Value > x.Value) 
       return -1; 
      else if (y.Value < x.Value) 
       return 1; 
      return 0; 
     }); 
     D2.Sort(delegate(SingleValue x, SingleValue y) 
     { 
      if (y.Value > x.Value) 
       return -1; 
      else if (y.Value < x.Value) 
       return 1; 
      return 0; 
     }); 

     int start = Environment.TickCount; 

     Dictionary<string, float> breakers = new Dictionary<string, float>(); 
     List<SingleValue> additions = new List<SingleValue>(); 

     // Walk through D1 
     IEnumerator<SingleValue> i1 = D1.GetEnumerator(); 
     IEnumerator<SingleValue> i2 = D2.GetEnumerator(); 

     while (i1.MoveNext() && i2.MoveNext()) 
     { 
      while (breakers.ContainsKey(i1.Current.Key)) 
      { 
       if (!i1.MoveNext()) 
        break; 
      } 

      while (i1.Current.Key != i2.Current.Key) 
      { 
       if (D1keys.Contains(i2.Current.Key)) 
        breakers.Add(i2.Current.Key, i2.Current.Value); 
       else 
        additions.Add(i2.Current); 
       if (!i2.MoveNext()) 
        break; 
      } 
     } 

     int duration = Environment.TickCount - start; 
     Console.WriteLine("Lookup took {0}ms", duration); 
     Console.ReadKey(); 
    } 
}

enter image description here

来源

2012-04-06 09:31:39

在您的代码中，while循环将i1和i2作为迭代条件。这意味着D1和D2必须具有相同的尺寸，这是不确定的。它可能是不同的，你分享的是O（n^4）是我们可以得到的最优的？请建议.. – 2012-04-06 09:52:20

他们不必是相同的大小。如果最短列表已结束，while循环就会停止 – 2012-04-06 09:55:16

现在，如果没有其他方法，我会将其标记为答案..但我只想从其他人那里听取其他更好的方法，并使用最少的循环字典... – 2012-04-06 10:01:33

如果你可以在排序之前从D2中删除东西很容易，对吧？你说你不能删除数据。但是，您可以创建一个模拟此类删除的额外数据结构（例如，将“已删除”位添加到该项目，或者如果您无法更改其类型，则会创建一组“已删除”项目）。然后运行简单的算法，但确保忽略“已删除”的项目。

来源

2012-04-06 08:39:36

我不明白你在说什么，请问你能说得更详细些吗？ – 2012-04-06 09:41:36

我一直在思考这个问题。正如你所提到的Levenshtein距离，我假设你想通过将他们从D2中的位置移动到D2中的某个其他位置来获得这些元素，您将以最少的移动次数从D2中获得D1（忽略不存在于两个序列）。

我写了一个贪婪算法，它可能足以满足您的需求，但它可能不一定会在所有情况下都给出最佳结果。我真的不确定，可能会在稍后（最早的周末）回来查看正确性。然而，如果你真的需要在300万个元素的序列上做这件事，我相信没有任何一种算法在这方面做得很好，因为我看不到一个O（n）算法，好工作，即使在一些微不足道的输入上也不会失败。

该算法尝试将每个元素移动到其预期位置，并计算移动后误差的总和（每个元素距原始位置的距离）。导致错误总和最小的元素被宣布为断路器并移动。重复此操作直到序列返回到D1。我认为它具有O（n^3）的复杂性，尽管元素有时需要被移动多次，所以它可能是O（n^4）最坏的情况，我不确定，但是在100万个随机数有50个元素的例子中，外环运行的最大数量是51（n^4意味着它可以是2500，并且在我所有的百万测试中我都很幸运）。只有钥匙，没有价值。这是因为这个值在这一步是无关紧要的，所以没有必要存储它们。

编辑：我为此写了一个反例生成器，事实上它并不总是最优的。破坏者越多，获得最佳解决方案的可能性就越小。例如，在1000个元素与50次随机移动的人它通常会发现一组55-60断路器，当最佳的解决方案是至多50

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 

namespace Breakers 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      //test case 1 
      //List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" }; 
      //List<string> L2 = new List<string> { "R9", "R5", "R1", "R2", "R8", "R3", "R6", "R4", "R10", "R11", "R7" }; 
      //GetBreakers<string>(L1, L2); 

      //test case 2 
      //List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" }; 
      //List<string> L2 = new List<string> { "R5", "R9", "R1", "R6", "R2", "R3", "R4", "R10", "R8", "R7" }; 
      //GetBreakers<string>(L1, L2); 

      //test case 3 
      List<int> L1 = new List<int>(); 
      List<int> L2 = new List<int>(); 
      Random r = new Random(); 
      int n = 100; 
      for (int i = 0; i < n; i++) 
      { 
       L1.Add(i); 
       L2.Add(i); 
      } 
      for (int i = 0; i < 5; i++) // number of random moves, this is the upper bound of the optimal solution 
      { 
       int a = r.Next() % n; 
       int b = r.Next() % n; 
       if (a == b) 
       { 
        i--; 
        continue; 
       } 
       int x = L2[a]; 
       Console.WriteLine(x); 
       L2.RemoveAt(a); 
       L2.Insert(b, x); 
      } 
      for (int i = 0; i < L2.Count; i++) Console.Write(L2[i]); 
      Console.WriteLine(); 
      GetBreakers<int>(L1, L2); 
     } 

     static void GetBreakers<T>(List<T> L1, List<T> L2) 
     { 
      Dictionary<T, int> Appearances = new Dictionary<T, int>(); 
      for (int i = 0; i < L1.Count; i++) Appearances[L1[i]] = 1; 
      for (int i = 0; i < L2.Count; i++) if (Appearances.ContainsKey(L2[i])) Appearances[L2[i]] = 2; 
      for (int i = L1.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L1[i]) && Appearances[L1[i]] == 2)) L1.RemoveAt(i); 
      for (int i = L2.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L2[i]) && Appearances[L2[i]] == 2)) L2.RemoveAt(i); 
      Dictionary<T, int> IndInL1 = new Dictionary<T, int>(); 
      for (int i = 0; i < L1.Count; i++) IndInL1[L1[i]] = i; 

      Dictionary<T, int> Breakers = new Dictionary<T, int>(); 

      int steps = 0; 
      int me = 0; 
      while (true) 
      { 
       steps++; 
       int minError = int.MaxValue; 
       int minErrorIndex = -1; 

       for (int from = 0; from < L2.Count; from++) 
       { 
        T x = L2[from]; 
        int to = IndInL1[x]; 
        if (from == to) continue; 

        L2.RemoveAt(from); 
        L2.Insert(to, x); 

        int error = 0; 
        for (int i = 0; i < L2.Count; i++) 
         error += Math.Abs((i - IndInL1[L2[i]])); 

        L2.RemoveAt(to); 
        L2.Insert(from, x); 

        if (error < minError) 
        { 
         minError = error; 
         minErrorIndex = from; 
        } 
       } 

       if (minErrorIndex == -1) break; 

       T breaker = L2[minErrorIndex]; 
       int breakerOriginalPosition = IndInL1[breaker]; 

       L2.RemoveAt(minErrorIndex); 
       L2.Insert(breakerOriginalPosition, breaker); 

       Breakers[breaker] = 1; 

       me = minError; 
      } 
      Console.WriteLine("Breakers: " + Breakers.Count + " Steps: " + steps); 
      foreach (KeyValuePair<T, int> p in Breakers) 
       Console.WriteLine(p.Key); 
      Console.ReadLine(); 
     } 
    } 
}

来源

2012-04-10 11:47:34 svinja

对两个字典进行排序并查找排序列表中的项目索引位置的差异

回答

相关问题