2017-10-11 143 views
1

我想比较两个csv文件并在文件中打印差异。我目前使用下面的代码删除一行。我可以更改此代码,以便比较两个csv文件,或者在c#中有更好的方法来比较csv文件吗?在C中比较两个csv文件#

List<string> lines = new List<string>(); 
     using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path))) 
     { 
      string line; 
      while ((line = reader.ReadLine()) != null) 
      { 
       if (line.Contains(csvseperator)) 
       { 
        string[] split = line.Split(Convert.ToChar(scheidingsteken)); 

        if (split[selectedRow] == value) 
        { 

        } 
        else 
        { 
         line = string.Join(csvseperator, split); 
         lines.Add(line); 
        } 
       } 

      } 
     } 

     using (StreamWriter writer = new StreamWriter(path, false)) 
     { 
      foreach (string line in lines) 
       writer.WriteLine(line); 
     } 
    } 
+3

如果你想找出*加*,*删除*和* *改变线路,请看看在*编辑距离* https://en.wikipedia.org/wiki/Edit_distance –

+0

我不能使用它。 – Mylan

+2

你为什么这么难过?你为什么不能使用它?最简单的编辑距离(*莱文斯坦*一个)易于实现 https://en.wikipedia.org/wiki/Levenshtein_distance –

回答

0

如果你只是想比较一列,你可以使用此代码:

   List<string> lines = new List<string>(); 
    List<string> lines2 = new List<string>(); 



    try 
    { 
     StreamReader reader = new StreamReader(System.IO.File.OpenRead(pad)); 
     StreamReader read = new StreamReader(System.IO.File.OpenRead(pad2)); 

     string line; 
     string line2; 

     //With this you can change the cells you want to compair 
     int comp1 = 1; 
     int comp2 = 1; 

     while ((line = reader.ReadLine()) != null && (line2 = read.ReadLine()) != null) 
     {   
      string[] split = line.Split(Convert.ToChar(seperator)); 
      string[] split2 = line2.Split(Convert.ToChar(seperator)); 

      if (line.Contains(seperator) && line2.Contains(seperator)) 
      { 
       if (split[comp1] != split2[comp2]) 
       { 
        //It is not the same 
       } 
       else 
       { 
        //It is the same 

       } 
      } 
     } 
     reader.Dispose(); 
     read.Dispose(); 
    } 
    catch 
    { 

    } 
+0

非常感谢你这个完美的作品:) – Mylan

+0

这只能检查每一行的第2列,而忽略行,如果一个CSV含有比其他更多的线路。 –

+0

我该如何解决这个问题? – Mylan

0

这里找到CSV文件之间的差异的另一种方式,利用Cinchoo ETL - 一个开源库

对于以下示例CSV文件

sample1.csv

id,name 
1,Tom 
2,Mark 
3,Angie 

sample2.csv

id,name 
1,Tom 
2,Mark 
4,Lu 

使用Cinchoo ETL,下面的代码演示了如何通过所有列

var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader(); 
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader(); 

using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader()) 
{ 
    output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)); 
    output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)); 
} 

找到行之间的差异sampleDiff.csv

id,name 
3,Angie 
4,Lu 

如果您想通过 'ID' 列做的差异,

var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader(); 
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader(); 

using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader()) 
{ 
    output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" }))); 
    output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" }))); 
} 

希望这有助于。