我一直在思考这个问题。正如你所提到的Levenshtein距离,我假设你想通过将他们从D2中的位置移动到D2中的某个其他位置来获得这些元素,您将以最少的移动次数从D2中获得D1(忽略不存在于两个序列)。
我写了一个贪婪算法,它可能足以满足您的需求,但它可能不一定会在所有情况下都给出最佳结果。我真的不确定,可能会在稍后(最早的周末)回来查看正确性。然而,如果你真的需要在300万个元素的序列上做这件事,我相信没有任何一种算法在这方面做得很好,因为我看不到一个O(n)算法,好工作,即使在一些微不足道的输入上也不会失败。
该算法尝试将每个元素移动到其预期位置,并计算移动后误差的总和(每个元素距原始位置的距离)。导致错误总和最小的元素被宣布为断路器并移动。重复此操作直到序列返回到D1。我认为它具有O(n^3)的复杂性,尽管元素有时需要被移动多次,所以它可能是O(n^4)最坏的情况,我不确定,但是在100万个随机数有50个元素的例子中,外环运行的最大数量是51(n^4意味着它可以是2500,并且在我所有的百万测试中我都很幸运)。只有钥匙,没有价值。这是因为这个值在这一步是无关紧要的,所以没有必要存储它们。
编辑:我为此写了一个反例生成器,事实上它并不总是最优的。破坏者越多,获得最佳解决方案的可能性就越小。例如,在1000个元素与50次随机移动的人它通常会发现一组55-60断路器,当最佳的解决方案是至多50
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Breakers
{
class Program
{
static void Main(string[] args)
{
//test case 1
//List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" };
//List<string> L2 = new List<string> { "R9", "R5", "R1", "R2", "R8", "R3", "R6", "R4", "R10", "R11", "R7" };
//GetBreakers<string>(L1, L2);
//test case 2
//List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" };
//List<string> L2 = new List<string> { "R5", "R9", "R1", "R6", "R2", "R3", "R4", "R10", "R8", "R7" };
//GetBreakers<string>(L1, L2);
//test case 3
List<int> L1 = new List<int>();
List<int> L2 = new List<int>();
Random r = new Random();
int n = 100;
for (int i = 0; i < n; i++)
{
L1.Add(i);
L2.Add(i);
}
for (int i = 0; i < 5; i++) // number of random moves, this is the upper bound of the optimal solution
{
int a = r.Next() % n;
int b = r.Next() % n;
if (a == b)
{
i--;
continue;
}
int x = L2[a];
Console.WriteLine(x);
L2.RemoveAt(a);
L2.Insert(b, x);
}
for (int i = 0; i < L2.Count; i++) Console.Write(L2[i]);
Console.WriteLine();
GetBreakers<int>(L1, L2);
}
static void GetBreakers<T>(List<T> L1, List<T> L2)
{
Dictionary<T, int> Appearances = new Dictionary<T, int>();
for (int i = 0; i < L1.Count; i++) Appearances[L1[i]] = 1;
for (int i = 0; i < L2.Count; i++) if (Appearances.ContainsKey(L2[i])) Appearances[L2[i]] = 2;
for (int i = L1.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L1[i]) && Appearances[L1[i]] == 2)) L1.RemoveAt(i);
for (int i = L2.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L2[i]) && Appearances[L2[i]] == 2)) L2.RemoveAt(i);
Dictionary<T, int> IndInL1 = new Dictionary<T, int>();
for (int i = 0; i < L1.Count; i++) IndInL1[L1[i]] = i;
Dictionary<T, int> Breakers = new Dictionary<T, int>();
int steps = 0;
int me = 0;
while (true)
{
steps++;
int minError = int.MaxValue;
int minErrorIndex = -1;
for (int from = 0; from < L2.Count; from++)
{
T x = L2[from];
int to = IndInL1[x];
if (from == to) continue;
L2.RemoveAt(from);
L2.Insert(to, x);
int error = 0;
for (int i = 0; i < L2.Count; i++)
error += Math.Abs((i - IndInL1[L2[i]]));
L2.RemoveAt(to);
L2.Insert(from, x);
if (error < minError)
{
minError = error;
minErrorIndex = from;
}
}
if (minErrorIndex == -1) break;
T breaker = L2[minErrorIndex];
int breakerOriginalPosition = IndInL1[breaker];
L2.RemoveAt(minErrorIndex);
L2.Insert(breakerOriginalPosition, breaker);
Breakers[breaker] = 1;
me = minError;
}
Console.WriteLine("Breakers: " + Breakers.Count + " Steps: " + steps);
foreach (KeyValuePair<T, int> p in Breakers)
Console.WriteLine(p.Key);
Console.ReadLine();
}
}
}
是它允许修剪D2为D1的大小(因为D2在这种情况下更大)在排序之前? – 2012-04-06 07:04:38
不允许。你必须像现在一样使用D2。 – 2012-04-06 07:05:50
在排序之前,D1中第0个元素的键名“R1”是否总是与D2中第0个元素的键名匹配? – 2012-04-06 07:08:54