2009-09-24 111 views
8

我正在比较两个xml,我必须打印差异。我如何使用LINQ来实现这一点。 我知道我可以使用Microsoft的XML diff修补程序,但我更喜欢使用LINQ。如果您有任何其他想法,我将实现比较两个xml并使用LINQ打印差异

//第一个XML

<Books> 
<book> 
    <id="20504" image="C01" name="C# in Depth"> 
</book> 
<book> 
    <id="20505" image="C02" name="ASP.NET"> 
</book> 
<book> 
    <id="20506" image="C03" name="LINQ in Action "> 
</book> 
<book> 
    <id="20507" image="C04" name="Architecting Applications"> 
</book> 
</Books> 

//第二个XML

<Books> 
    <book> 
    <id="20504" image="C011" name="C# in Depth"> 
    </book> 
    <book> 
    <id="20505" image="C02" name="ASP.NET 2.0"> 
    </book> 
    <book> 
    <id="20506" image="C03" name="LINQ in Action "> 
    </book> 
    <book> 
    <id="20508" image="C04" name="Architecting Applications"> 
    </book> 
</Books> 

我想比较像这样这样两个XML和打印结果。

Issued  Issue Type    IssueInFirst IssueInSecond 

1   image is different  C01    C011 
2   name is different  ASP.NET   ASP.NET 2.0 
3   id is different  20507   20508 
+4

xml有多复杂?如果它是*只是* root/record/@ attrib它可能是可行的。 – 2009-09-24 06:24:26

+1

(该XML无效,顺便说一句) – 2009-09-24 06:36:36

+0

嗨,马克这是非常简单的例子,在实际的XML中它的一点点复杂。 – NETQuestion 2009-09-24 06:43:25

回答

1

这里是解决方案:

//sanitised xmls: 
string s1 = @"<Books> 
       <book id='20504' image='C01' name='C# in Depth'/> 
       <book id='20505' image='C02' name='ASP.NET'/> 
       <book id='20506' image='C03' name='LINQ in Action '/> 
       <book id='20507' image='C04' name='Architecting Applications'/> 
       </Books>"; 
string s2 = @"<Books> 
        <book id='20504' image='C011' name='C# in Depth'/> 
        <book id='20505' image='C02' name='ASP.NET 2.0'/> 
        <book id='20506' image='C03' name='LINQ in Action '/> 
        <book id='20508' image='C04' name='Architecting Applications'/> 
       </Books>"; 

XDocument xml1 = XDocument.Parse(s1); 
XDocument xml2 = XDocument.Parse(s2); 

//get cartesian product (i think) 
var result1 = from xmlBooks1 in xml1.Descendants("book") 
       from xmlBooks2 in xml2.Descendants("book") 
       select new { 
          book1 = new { 
             id=xmlBooks1.Attribute("id").Value, 
             image=xmlBooks1.Attribute("image").Value, 
             name=xmlBooks1.Attribute("name").Value 
             }, 
          book2 = new { 
             id=xmlBooks2.Attribute("id").Value, 
             image=xmlBooks2.Attribute("image").Value, 
             name=xmlBooks2.Attribute("name").Value 
             } 
          }; 

//get every record that has at least one attribute the same, but not all 
var result2 = from i in result1 
       where (i.book1.id == i.book2.id 
         || i.book1.image == i.book2.image 
         || i.book1.name == i.book2.name) && 
         !(i.book1.id == i.book2.id 
         && i.book1.image == i.book2.image 
         && i.book1.name == i.book2.name) 
       select i; 



foreach (var aa in result2) 
{ 
    //you do the output :D 
} 

两个LINQ语句可能会被合并,但我离开,作为一个练习。

+0

如果这实际上按要求工作,我会感到惊讶你真的想要一个交叉连接(笛卡尔产品)? – dahlbyk 2009-09-25 13:39:55

+0

是的,它的工作原理。下次你可以在评论前自己检查一下。现在让我们来看看你的解决方案。 – 2009-09-26 11:15:11

+0

它为这个示例集合产生相同的结果,是的。但它并不能解决一般问题,因为我理解它。例如,假设id = 20508的xml2书是一个错字,并且下一个条目在每个源中都有“真实”20508数据。您的解决方案将返回两行;我会返回一个。这两个问题的答案都是正确的。 – dahlbyk 2009-09-26 14:36:35

1

你想在这里的操作是一个Zip来配对你的两本书中的相应元素。该运营商正在added in .NET 4.0,但我们可以伪造它通过选择抢书指数和加盟上:

var res = from b1 in xml1.Descendants("book") 
         .Select((b, i) => new { b, i }) 
      join b2 in xml2.Descendants("book") 
         .Select((b, i) => new { b, i }) 
      on b1.i equals b2.i 

我们会再使用第二个加入由名称属性的值进行比较。请注意,这是一个内部联接;如果你确实想要包含从其中一个或另一个中缺少的属性,你将不得不做很多工作。

  select new 
      { 
       Row = b1.i, 
       Diff = from a1 in b1.b.Attributes() 
        join a2 in b2.b.Attributes() 
         on a1.Name equals a2.Name 
        where a1.Value != a2.Value 
        select new 
        { 
         Name = a1.Name, 
         Value1 = a1.Value, 
         Value2 = a2.Value 
        } 
      }; 

其结果将是一个嵌套集合:

foreach (var b in res) 
{ 
    Console.WriteLine("Row {0}: ", b.Row); 
    foreach (var d in b.Diff) 
     Console.WriteLine(d); 
} 

或获得每本书多行:

var report = from r in res 
      from d in r.Diff 
      select new { r.Row, Diff = d }; 

foreach (var d in report) 
    Console.WriteLine(d); 

哪报告如下:

{ Row = 0, Diff = { Name = image, Value1 = C01, Value2 = C011 } } 
{ Row = 1, Diff = { Name = name, Value1 = ASP.NET, Value2 = ASP.NET 2.0 } } 
{ Row = 3, Diff = { Name = id, Value1 = 20507, Value2 = 20508 } } 
+0

那么拉链的事情是,它将xml1的第一条记录连接到xml2的第一条记录。所以如果我们将xml1混合一点 - 让我们说我们切换第一个和第二个节点 - 我们得到不同的结果。这就是为什么你需要交叉连接。没有理由假设(从他的问题和评论)只有相应的节点应该比较。 – 2009-09-26 11:26:00

+0

这个问题被描述为差异。在差异中,顺序很重要。 – dahlbyk 2009-09-26 14:27:00

1

对于有趣的,一个通用的解决方案ga g的问题解读。为了说明我对这种方法的反对意见,我为“PowerShell in Action”引入了一个“正确”的条目。

string s1 = @"<Books> 
    <book id='20504' image='C01' name='C# in Depth'/> 
    <book id='20505' image='C02' name='ASP.NET'/> 
    <book id='20506' image='C03' name='LINQ in Action '/> 
    <book id='20507' image='C04' name='Architecting Applications'/> 
    <book id='20508' image='C05' name='PowerShell in Action'/> 
    </Books>"; 
string s2 = @"<Books> 
    <book id='20504' image='C011' name='C# in Depth'/> 
    <book id='20505' image='C02' name='ASP.NET 2.0'/> 
    <book id='20506' image='C03' name='LINQ in Action '/> 
    <book id='20508' image='C04' name='Architecting Applications'/> 
    <book id='20508' image='C05' name='PowerShell in Action'/> 
    </Books>"; 

XDocument xml1 = XDocument.Parse(s1); 
XDocument xml2 = XDocument.Parse(s2); 

var res = from b1 in xml1.Descendants("book") 
      from b2 in xml2.Descendants("book") 
      let issues = from a1 in b1.Attributes() 
         join a2 in b2.Attributes() 
         on a1.Name equals a2.Name 
         select new 
         { 
          Name = a1.Name, 
          Value1 = a1.Value, 
          Value2 = a2.Value 
         } 
      where issues.Any(i => i.Value1 == i.Value2) 
      from issue in issues 
      where issue.Value1 != issue.Value2 
      select issue; 

哪报告如下:

{ Name = image, Value1 = C01, Value2 = C011 } 
{ Name = name, Value1 = ASP.NET, Value2 = ASP.NET 2.0 } 
{ Name = id, Value1 = 20507, Value2 = 20508 } 
{ Name = image, Value1 = C05, Value2 = C04 } 
{ Name = name, Value1 = PowerShell in Action, Value2 = Architecting Applications } 

注意,最后两个条目20508错字和否则正确20508项之间的“冲突”。