多线程集差异的有效方法

我有一组消费线程，每个消费线程都需要一个工作。一旦他们处理完作业，他们就会列出所消耗作业中列出的子作业列表。我需要从列表中添加我在数据库中没有的子作业。数据库中有300万个，因此获取那些尚未存在于数据库中的列表很慢。我不介意每个线程在该调用上阻塞，但由于我有一个竞争条件（请参阅代码），我必须将它们全部锁定在慢速调用中，以便他们只能一次调用该部分，并且我的程序会抓取。我可以做些什么来解决这个问题，以便线程不会为那个调用减慢速度？我尝试了一个队列，但由于线程推出的作业列表比计算机可以确定哪些应该添加到数据库的速度更快，我最终得到了一个队列，它不断增长，从不排空。多线程集差异的有效方法

我的代码：

IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID) 
{ 
    return subJobs.Select(el => el.name) 
     .Except(db.jobs.Where(el => el.set_ID==setID).Select(el => el.name)); 
} 

//...consumer thread i 
lock(lockObj) 
{ 
    var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID); 
    //if there was a context switch here to some thread i+1 
    // and that thread found uniqueJobs that also were found in thread i 
    // then there will be multiple copies of the same job added in the database. 
    // So I put this section in a lock to prevent that. 
    saveJobsToDatabase(uniqueJobName, consumerSetID); 
} 
//continue consumer thread i...

来源

2012-03-12 brandon

这我不清楚你正在尝试待办事项，您能否再次解释您正在尝试的待办事项，但没有关于您目前如何做的信息，只是让实际任务变得更加清晰 – ntziolis 2012-03-12 18:54:29

不能先获取现有作业的列表，然后编译列表并行的“新”副作业，最后，保存新的工作？ – 2012-03-12 19:18:11

问题是我不知道哪些是新的，除非我将它们与数据库比较使用except。我可以编译出现的所有子作业列表，但是当我最终想要将该列表与数据库进行比较时，下一个列表出现时就不会完成。无论我以后缓存列表还是立即运行，它们的建立速度都比我可以运行Except方法的速度快。实际上，如果我立即运行它，消费者会跑得更快，问题更加复杂。我猜测有一些数据结构可以提供帮助，或者只是一种不同的算法。 – brandon 2012-03-12 19:21:21

而不是去到数据库中，以检查作业名称的唯一性，你可以在相应的信息为查找数据结构到内存中，它可以让你更快地检查是否存在：

Dictionary<int, HashSet<string>> jobLookup = db.jobs.GroupBy(i => i.set_ID) 
    .ToDictionary(i => i.Key, i => new HashSet<string>(i.Select(i => i.Name)));

这个你只能做一次。此后每次需要检查唯一一次使用查找：

IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID) 
{ 
    var existingJobs = jobLookup.ContainsKey(setID) ? jobLookup[setID] : new HashSet<string>(); 

    return subJobs.Select(el => el.Name) 
     .Except(existingJobs); 
}

如果您需要输入一个新的子任务也将它添加到查询：

lock(lockObj) 
{ 
    var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID); 
    //if there was a context switch here to some thread i+1 
    // and that thread found uniqueJobs that also were found in thread i 
    // then there will be multiple copies of the same job added in the database. 
    // So I put this section in a lock to prevent that. 
    saveJobsToDatabase(uniqueJobName, consumerSetID); 

    if(!jobLookup.ContainsKey(newconsumerSetID)) 
    { 
     jobLookup.Add(newconsumerSetID, new HashSet<string>(uniqueJobNames)); 
    } 
    else 
    { 
     jobLookup[newconsumerSetID] = new HashSet<string>(jobLookup[newconsumerSetID].Concat(uniqueJobNames))); 
    } 
}

来源

2012-03-12 19:30:22 ntziolis

不错的解决方案。我宁愿使用像这样的内存，而不是每次都有NlogN查找。我将写一个这个数据结构的自定义版本，用于将新增加的数据与数据库同步。 – brandon 2012-03-12 19:36:56

我的建议是不要过多地使数据结构复杂化，单独处理DB /内存，使调试问题更简单 – ntziolis 2012-03-12 19:51:41

多线程集差异的有效方法

回答

相关问题