2009-10-30 75 views
23

我想查找一个集合中与另一个集合不匹配的所有项目。尽管如此,这些藏品并不属于同一类型。我想写一个lambda表达式来指定相等。使用LINQ to Objects在一个集合中查找不匹配的项目

的什么,我试图做一个LINQPad例如:

void Main() 
{ 
    var employees = new[] 
    { 
     new Employee { Id = 20, Name = "Bob" }, 
     new Employee { Id = 10, Name = "Bill" }, 
     new Employee { Id = 30, Name = "Frank" } 
    }; 

    var managers = new[] 
    { 
     new Manager { EmployeeId = 20 }, 
     new Manager { EmployeeId = 30 } 
    }; 

    var nonManagers = 
    from employee in employees 
    where !(managers.Any(x => x.EmployeeId == employee.Id)) 
    select employee; 

    nonManagers.Dump(); 

    // Based on cdonner's answer: 

    var nonManagers2 = 
    from employee in employees 
    join manager in managers 
     on employee.Id equals manager.EmployeeId 
    into tempManagers 
    from manager in tempManagers.DefaultIfEmpty() 
    where manager == null 
    select employee; 

    nonManagers2.Dump(); 

    // Based on Richard Hein's answer: 

    var nonManagers3 = 
    employees.Except(
     from employee in employees 
     join manager in managers 
      on employee.Id equals manager.EmployeeId 
     select employee); 

    nonManagers3.Dump(); 
} 

public class Employee 
{ 
    public int Id { get; set; } 
    public string Name { get; set; } 
} 

public class Manager 
{ 
    public int EmployeeId { get; set; } 
} 

上述工作,并返回员工比尔(#10)。不过,它看起来并不高雅,而且对于较大型的收藏可能效率不高。在SQL中,我可能会做一个LEFT JOIN并找到第二个ID为NULL的项目。在LINQ中做这件事的最佳做法是什么?

编辑:已更新,以防止取决于Id等于索引的解决方案。

编辑:添加CDonner的解决方案 - 任何人有什么更简单?

编辑:添加了理查德海因的答案,我目前的最爱变体。感谢大家的一些优秀答案!

回答

30

这是几乎相同的其他一些例子,但更少的代码:

employees.Except(employees.Join(managers, e => e.Id, m => m.EmployeeId, (e, m) => e)); 

它不是任何简单比employees.Where(E => managers.Any(M! => m.EmployeeId == e.Id))或您的原始语法。

+0

其实我比其他解决方案更喜欢这个 - 我发现它的含义更清晰。我重新编写了查询语法中的连接(请参阅我的问题中的修订示例代码),以满足个人偏好。谢谢! – TrueWill 2009-10-31 17:17:32

+0

当涉及一个大集合,除了tooooo缓慢。加入答案是最好的。 – 2016-03-16 15:19:24

5
/// <summary> 
    /// This method returns items in a set that are not in 
    /// another set of a different type 
    /// </summary> 
    /// <typeparam name="T"></typeparam> 
    /// <typeparam name="TOther"></typeparam> 
    /// <typeparam name="TKey"></typeparam> 
    /// <param name="items"></param> 
    /// <param name="other"></param> 
    /// <param name="getItemKey"></param> 
    /// <param name="getOtherKey"></param> 
    /// <returns></returns> 
    public static IEnumerable<T> Except<T, TOther, TKey>(
              this IEnumerable<T> items, 
              IEnumerable<TOther> other, 
              Func<T, TKey> getItemKey, 
              Func<TOther, TKey> getOtherKey) 
    { 
     return from item in items 
       join otherItem in other on getItemKey(item) 
       equals getOtherKey(otherItem) into tempItems 
       from temp in tempItems.DefaultIfEmpty() 
       where ReferenceEquals(null, temp) || temp.Equals(default(TOther)) 
       select item; 
    } 

我不记得我在哪里找到这种方法。

+0

+1 - 尼斯。我稍微修改了这一点,并将其纳入我的问题。不过,我想看看其他人想出了什么。谢谢! – TrueWill 2009-10-30 14:53:35

2

看看Except()LINQ函数。它完全符合你的需求。

+0

except函数仅适用于2套相同的对象类型,但不会直接适用于他的员工和经理的示例。因此我的答案中的重载方法。 – cdonner 2009-10-31 00:43:33

3
var nonmanagers = employees.Select(e => e.Id) 
    .Except(managers.Select(m => m.EmployeeId)) 
    .Select(id => employees.Single(e => e.Id == id)); 
+1

无法保证EmployeeId将匹配数组中的员工索引... – 2009-10-30 13:04:25

+0

好主意 - 我没有考虑选择ID,因此除了缺省的相等比较器会比较整数。然而,莱维斯克先生是正确的,我已经更新了这个例子来反映这一点。你能提供一个正确地返回员工的例子吗? – TrueWill 2009-10-30 14:38:43

+0

啊你说得对。答案已更新。 – 2009-10-30 19:39:01

5

 
     var nonManagers = (from e1 in employees 
          select e1).Except(
            from m in managers 
            from e2 in employees 
            where m.EmployeeId == e2.Id 
            select e2); 

+0

+1。优雅和正常工作。 – TrueWill 2009-10-30 17:14:22

+1

谢谢。最初发现它在这里:http://rsanidad.wordpress.com/2007/10/16/linq-except-and-intersect/ – 2009-10-30 18:04:07

3

有点晚了(我知道)。

我在看同样的问题,并且正在考虑一个HashSet,因为在该方向inc的各种性能提示。 @飞碟双向的Intersection of multiple lists with IEnumerable.Intersect() - 而在我的办公室问及共识是一个HashSet会更快,更具可读性:

HashSet<int> managerIds = new HashSet<int>(managers.Select(x => x.EmployeeId)); 
nonManagers4 = employees.Where(x => !managerIds.Contains(x.Id)).ToList(); 

然后我提供使用本机阵列创建位掩码十岁上下型解决方案更快的解决方案(本机数组查询中的语法会让我停止使用它们,除非出于极端的性能原因)。

为了给这个答案一个可怕的很长一段时间,我延长你的linqpad程序和数据的定时之后一点点信任,所以你可以比较现在有哪些六个选项:

void Main() 
{ 
    var employees = new[] 
    { 
     new Employee { Id = 20, Name = "Bob" }, 
     new Employee { Id = 10, Name = "Kirk NM" }, 
     new Employee { Id = 48, Name = "Rick NM" }, 
     new Employee { Id = 42, Name = "Dick" }, 
     new Employee { Id = 43, Name = "Harry" }, 
     new Employee { Id = 44, Name = "Joe" }, 
     new Employee { Id = 45, Name = "Steve NM" }, 
     new Employee { Id = 46, Name = "Jim NM" }, 
     new Employee { Id = 30, Name = "Frank"}, 
     new Employee { Id = 47, Name = "Dave NM" }, 
     new Employee { Id = 49, Name = "Alex NM" }, 
     new Employee { Id = 50, Name = "Phil NM" }, 
     new Employee { Id = 51, Name = "Ed NM" }, 
     new Employee { Id = 52, Name = "Ollie NM" }, 
     new Employee { Id = 41, Name = "Bill" }, 
     new Employee { Id = 53, Name = "John NM" }, 
     new Employee { Id = 54, Name = "Simon NM" } 
    }; 

    var managers = new[] 
    { 
     new Manager { EmployeeId = 20 }, 
     new Manager { EmployeeId = 30 }, 
     new Manager { EmployeeId = 41 }, 
     new Manager { EmployeeId = 42 }, 
     new Manager { EmployeeId = 43 }, 
     new Manager { EmployeeId = 44 } 
    }; 

    System.Diagnostics.Stopwatch watch1 = new System.Diagnostics.Stopwatch(); 

    int max = 1000000; 

    watch1.Start(); 
    List<Employee> nonManagers1 = new List<Employee>(); 
    foreach (var item in Enumerable.Range(1,max)) 
    { 
     nonManagers1 = (from employee in employees where !(managers.Any(x => x.EmployeeId == employee.Id)) select employee).ToList(); 

    } 
    nonManagers1.Dump(); 
    watch1.Stop(); 
    Console.WriteLine("Any: " + watch1.ElapsedMilliseconds); 

    watch1.Restart();  
    List<Employee> nonManagers2 = new List<Employee>(); 
    foreach (var item in Enumerable.Range(1,max)) 
    { 
     nonManagers2 = 
     (from employee in employees 
     join manager in managers 
      on employee.Id equals manager.EmployeeId 
     into tempManagers 
     from manager in tempManagers.DefaultIfEmpty() 
     where manager == null 
     select employee).ToList(); 
    } 
    nonManagers2.Dump(); 
    watch1.Stop(); 
    Console.WriteLine("temp table: " + watch1.ElapsedMilliseconds); 

    watch1.Restart();  
    List<Employee> nonManagers3 = new List<Employee>(); 
    foreach (var item in Enumerable.Range(1,max)) 
    { 
     nonManagers3 = employees.Except(employees.Join(managers, e => e.Id, m => m.EmployeeId, (e, m) => e)).ToList(); 
    } 
    nonManagers3.Dump(); 
    watch1.Stop(); 
    Console.WriteLine("Except: " + watch1.ElapsedMilliseconds); 

    watch1.Restart();  
    List<Employee> nonManagers4 = new List<Employee>(); 
    foreach (var item in Enumerable.Range(1,max)) 
    { 
     HashSet<int> managerIds = new HashSet<int>(managers.Select(x => x.EmployeeId)); 
     nonManagers4 = employees.Where(x => !managerIds.Contains(x.Id)).ToList(); 

    } 
    nonManagers4.Dump(); 
    watch1.Stop(); 
    Console.WriteLine("HashSet: " + watch1.ElapsedMilliseconds); 

     watch1.Restart(); 
     List<Employee> nonManagers5 = new List<Employee>(); 
     foreach (var item in Enumerable.Range(1, max)) 
     { 
        bool[] test = new bool[managers.Max(x => x.EmployeeId) + 1]; 
        foreach (var manager in managers) 
        { 
         test[manager.EmployeeId] = true; 
        } 

        nonManagers5 = employees.Where(x => x.Id > test.Length - 1 || !test[x.Id]).ToList(); 


     } 
     nonManagers5.Dump(); 
     watch1.Stop(); 
     Console.WriteLine("Native array call: " + watch1.ElapsedMilliseconds); 

     watch1.Restart(); 
     List<Employee> nonManagers6 = new List<Employee>(); 
     foreach (var item in Enumerable.Range(1, max)) 
     { 
        bool[] test = new bool[managers.Max(x => x.EmployeeId) + 1]; 
        foreach (var manager in managers) 
        { 
         test[manager.EmployeeId] = true; 
        } 

        nonManagers6 = employees.Where(x => x.Id > test.Length - 1 || !test[x.Id]).ToList(); 
     } 
     nonManagers6.Dump(); 
     watch1.Stop(); 
     Console.WriteLine("Native array call 2: " + watch1.ElapsedMilliseconds); 
} 

public class Employee 
{ 
    public int Id { get; set; } 
    public string Name { get; set; } 
} 

public class Manager 
{ 
    public int EmployeeId { get; set; } 
} 
+0

不错的数据!谢谢! – TrueWill 2014-02-24 14:34:34

+0

如果您的员工和管理人员的身份证号非常高,例如在10万人中,那么您的稀疏阵列解决方案将会成为贵族。没有什么说ID不能这么高 - 它们是整数,我认为最好是编写没有奇怪边缘情况的代码。 – ErikE 2014-09-05 04:28:00

+0

@ErikE我不确定你在驾驶什么。 OP提供的数据是问题的一部分,我定时处理了6种数据处理方法。如果数据不同,则不同的选项可能会更优化。有没有一种解决方案能够与所有可以想象的数据集合一致?如果有的话,我会非常感激,如果你把它放出来,以便将来可以使用它。 – amelvin 2014-09-08 09:45:39

1

它的更好,如果你离开加盟物品和过滤条件为空

var finalcertificates = (from globCert in resultCertificate 
             join toExcludeCert in certificatesToExclude 
              on globCert.CertificateId equals toExcludeCert.CertificateId into certs 
             from toExcludeCert in certs.DefaultIfEmpty() 
             where toExcludeCert == null 
             select globCert).Union(currentCertificate).Distinct().OrderBy(cert => cert.CertificateName); 
0

管理员也是员工!因此,Manager类应该从Employee类继承而来(或者,如果您不喜欢这样的话,那么它们都应该从父类继承而来,或者生成NonManager类。

然后你的问题是实现IEquatable界面上你Employee超简单(为GetHashCode简单地返回EmployeeID),然后使用此代码:

var nonManagerEmployees = employeeList.Except(managerList); 
+0

好点;这只是一个消毒的例子。寻找不匹配的一般问题是一个很好的解决办法。 – TrueWill 2015-11-30 21:40:04

+0

虽然这可能是许多常见问题的一个很好的解决方案!如果两种不同的对象可以以某种方式合并,那么它们可能会共享一个可以通过超类/子类来表达的关系。在这种情况下,经理与员工存在“是 - 是”关系,因此使用继承是非常合理的。 “有-α”关系不太可能受到我建议的解决方案的影响(但这不一定如此,因为生命周期和角色对于正确建模可能非常棘手,开发人员有时可能会错过“有 - ”关系)。 – ErikE 2015-12-01 18:02:27

相关问题