2016-02-12 62 views
0

我有一个问题,我尝试使用parallel.for()将一些文件加载​​到数据库中。我的问题是传递给数据库函数的文件ID不知何故不正确。也就是说,数据库正在返回错误的数据。我试图通过使用并行字典来验证这一点,以添加具有和不具有并行的ID /名称对。在我看来,循环结束后的列表应该是相同的。但他们不是。这以非常简单的方式模拟了我正在做的事情。parallel.for混淆(收集丢失命令)

这是否有道理?:

class Program 
    { 
     ConcurrentDictionary<int, string> _cd = new ConcurrentDictionary<int, string>(); 
     static void Main() 
     { 
      //simulate the situation 
      int[] idList = new int[] {1, 8, 12, 19, 25, 99}; 
      string[] fileList = new string[] {"file1", "file8", "file12", "file19", "file25", "file99"}; 

      //run in serial first 
      ProcessFiles(idList, fileList); 

      //write out pairs to text file 
      foreach (var item in _cd) 
      { 
       var key = _cd.key; 
       var val = _cd.value; 
       string line = string.Format("fileId is {0} and fileName is {1}", key, val); 

       File.AppendAllText(@"c:\serial.txt", line + Environment.NewLine); 
      } 
      //results of text file (all good): 
      //fileId is 1 and fileName is file1 
      //fileId is 8 and fileName is file8 
      //fileId is 12 and fileName is file12 
      //fileId is 19 and fileName is file19 
      //fileId is 25 and fileName is file25 
      //fileId is 99 and fileName is file99 

      _cd.Clear(); 

      //now run in parallel 
      ProcessFilesInParallel(idList, fileList); 

      //write out pairs to text file 
      foreach (var item in _cd) 
      { 
       var key = _cd.key; 
       var val = _cd.value; 
       string line = string.Format("fileId is {0} and fileName is {1}", key, val); 

       File.AppendAllText(@"c:\parallel.txt", line + Environment.NewLine); 
      } 

      //results of text file (1. some, not all, are mismatched and 2. not all elements got added): 
      //fileId is 8 and fileName is file8 
      //fileId is 12 and fileName is file19 
      //fileId is 19 and fileName is file12 
      //fileId is 25 and fileName is file25 
     } 

     private void static ProcessFiles(int[]Ids, string[] files) 
     { 
      int fileId = 0; 
      string fileName = string.Empty; 

      for(var i=0, i<Ids.Count; i++) 
      { 
       fileId = Ids[i]; 
       fileName = GetControlFileMetaDataFromDB(fileId); 

       _cd.TryAdd(fileId, fileName); 
      } 
     } 

     private void static ProcessFilesInParallel(int[]Ids, string[] files) 
     { 
      int fileId = 0; 
      string fileName = string.Empty; 

      Parallel.For(0, Ids.Count, i => 
      { 
       fileId = Ids[i]; 

       //this is returning the wrong fileName 
       fileName = GetControlFileMetaDataFromDB(fileId); 

       _cd.TryAdd(fileId, fileName); 
      } 

      ); 
     } 

     private void static GetControlFileMetaDataFromDB(int fileId) 
     { 
      //removed for brevity: 
      //1. connect to oracle 
      //2. call function, passing file id 
      //3. iterate over data reader and look for the filename 

      while (reader.Read()) 
      { 
       //strip out filename, add it to collection 
       int endPos = reader[0].ToString().IndexOf("txt"); 
       if (endPos != -1) 
       { 
        endPos += 3; 
        int startPos = reader[0].ToString().IndexOf(":\\") - 1; 
        string path = reader[0].ToString().Substring(startPos, endPos - startPos); 
        sring fileName = Path.GetFileName(path); 

        _cd.TryAdd(fileId, fileName); 
        break; 
       } 
      } 
     } 
    } 
+0

请将代码复制到编辑器中并尝试编译它。有大量的错误。请修复它们并添加'using'指令,以便其他人可以检查代码。 –

+0

我会特别感兴趣的是'reader'变量来自何处。难道只有一个数据库连接,并且您正在从多个线程访问而没有同步。 –

+0

这是基本的数据访问代码,为了简洁起见,我放入评论中。下面的海报钉住了这个问题。 – inspectorGadget

回答

7

您已经声明fileIdfileName的的Parallel.For,这意味着相同的变量由每次迭代共享。

由于迭代可能很好地在不同线程上并行运行,因此您正在重新分配变量,而另一个同时迭代可能正在使用它们。

你需要做的是将你的变量声明放在的循环中,所以它们在本地迭代;

Parallel.For(0, Ids.Count, i => 
{ 
    int fileId = Ids[i]; 

    //this is returning the wrong fileName 
    string fileName = GetControlFileMetaDataFromDB(fileId); 

    _cd.TryAdd(fileId, fileName); 
} 
+0

谢谢。我知道这是愚蠢而简单的事情。它现在正在完美工作。 – inspectorGadget

1

这里的问题在ProcessFilesInParallel(int[]Ids, string[] files)函数中。 for循环中的迭代将并行执行,并且您在for的范围之外声明了fileIdfileName,所以这些变量将在所有处于争用条件下的迭代中共享。

可以解决这个问题,移动forfileIdfileName变量:

private static void ProcessFilesInParallel(int[] Ids, string[] files) 
{ 
    Parallel.For(0, Ids.Length, i => 
    { 
     var fileId = Ids[i]; 

     //this is returning the wrong fileName 
     var fileName = GetControlFileMetaDataFromDB(fileId); 

     _cd.TryAdd(fileId, fileName); 
    }); 
} 

此外,在问题的标题的Parallel.For混乱(收集失去顺序)你说集输秩序。正如您可以阅读here那样,并行循环中没有定义执行顺序。