2017-04-17 61 views
0

美好的一天。我有一个任务,我需要将word文档转换为html。c#:HtmlAgilityPack后裔

这可以使用interop完成并将文档保存为html。但我需要清除互操作的html输出

但我有一个htmlagilitypack的问题。我认为它类似的XmlDocument C#

这是我的C#代码

HtmlDocument doc = new HtmlDocument(); 
doc.Load(htmlLocation); 
     foreach (var item in doc.DocumentNode.Descendants("p")) 
     { 

     if (item.HasChildNodes) 
      { 
      foreach (var itm in item.Descendants("span").ToList()) 
       { 
        Console.WriteLine(itm.InnerText); 
       } 
      } 
     } 

这是HTML代码

<html> 

<head> 
<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> 
<meta name=Generator content="Microsoft Word 12 (filtered)"> 

</head> 

<body lang=EN-US link="#0066CC" vlink=purple style='text-justify-trim:punctuation'> 

<div class=WordSection1> 

<p class=Heading61 style='margin-bottom:0in;margin-bottom:.0001pt;text-indent: 
.5in;line-height:normal;page-break-after:avoid;background:transparent'><span 
class=Heading6><span style='font-size:12.0pt;color:black;background:yellow'>Epilogue</span></span></p> 

<p class=MsoBodyText style='line-height:normal;background:transparent'><span 
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style: 
normal'>&nbsp;</span></span></p> 

<p class=MsoBodyText style='line-height:normal;background:transparent'><span 
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style: 
normal'>Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child.</span></span></p> 

<p class=MsoBodyText style='text-indent:.5in;line-height:normal;background: 
transparent'><span class=BodytextItalic2><span style='font-size:12.0pt; 
color:black;font-style:normal'>Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they 
returned to his people. Summer Moon Rising had left the village the following 
day.</span></span></p> 

</div> 

</body> 

</html> 

这是代码的输出上面

Epilogue 
Epilogue 
&nbsp; 
&nbsp; 
Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child. 
Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child. 
Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they 
returned to his people. Summer Moon Rising had left the village the following 
day. 
Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they day. 

我期望的是每个的第二个取决于项目元素。但为什么它会重复文本?

回答

1

你有4个p标签,每个标签有两个跨度。后代,得到所有的子节点有两个跨度匹配的名称,这样你内心的foreach重复

你内心的foreach可以

foreach (var itm in item.ChildNodes) 
    { 
     Console.WriteLine(itm.InnerText); 
    }