2012-03-30 74 views
-6

我有一个页面,我需要从中提取div的innerhtml。为了识别div我只有班。提取div的InnerHtml?

<div class="os-box unround"> 
: 
: 
: 
</div> 

我需要提取具有class "os-box unround"在div的innerHTML,假设页从URL http://abc.com/xyz.html使用C#在页面加载事件到来。

**Input:** 

<div class="os-box unround"> 

    <div class="os-list" id="os-list-6.1 x64"> 



    <div class="item-box"> 

     <p class="item-title"><a href="http://devid.info/en/p127116/Atheros+AR5B95+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5B95 Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p> 

     <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p> 

     <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p> 

    <p class="item-date"><span>Driver Date: </span>2010-09-26</p> <p class="item-version"><span>Version: </span>8.0.0.372</p>  <p class="download"><a href="http://devid.info/p127116/Atheros+AR5B95+Wireless+Network+Adapter">Download</a></p> 

    </div> 



    <div class="adv-box"> 



    </div> 



    <div class="item-box"> 

     <p class="item-title"><a href="http://devid.info/en/p145532/Atheros+AR5005G+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5005G Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p> 

     <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p> 

     <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p> 

    <p class="item-date"><span>Driver Date: </span>2010-07-08</p> <p class="item-version"><span>Version: </span>9.0.0.222</p>  <p class="download"><a href="http://devid.info/p145532/Atheros+AR5005G+Wireless+Network+Adapter">Download</a></p> 

    </div> 





    <div class="item-box"> 

     <p class="item-title"><a href="http://devid.info/en/p134802/Atheros+AR5008X+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5008X Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p> 

     <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p> 

     <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p> 

    <p class="item-date"><span>Driver Date: </span>2010-06-24</p> <p class="item-version"><span>Version: </span>9.0.0.208</p>  <p class="download"><a href="http://devid.info/p134802/Atheros+AR5008X+Wireless+Network+Adapter">Download</a></p> 

    </div> 

</div> 
<div> 

一些网址,说http://abc.com/xyz.html有这样的html从上面说div。我想阅读它并在我自己的页面上显示其页面加载事件。

输出;

包含os-box非圆div的内部html的字符串。

+0

ehhh这是使用JavaScript? – Neal 2012-03-30 18:46:53

+0

尝试[jQuery类选择器](http://api.jquery.com/class-selector/) – 2012-03-30 18:47:33

+0

他没有提及或标记为jquery,虽然 – Rodolfo 2012-03-30 18:48:11

回答

1

您是否试过HtmlAgilityPack?它将允许您解析和查询(使用XPATH)很多您找到的格式错误的HTML。

如果我正确理解你的问题,你可以使用:

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); 
HtmlAgilityPack.HtmlDocument doc = web.Load("http://abc.com/xyz.html"); 

HtmlAgilityPack.HtmlNode div = doc.DocumentNode 
    .SelectSingleNode("/html/body/div[@class=\"os-box unround\"]"); 
string contentYouWantedToDisplayOnYourOwnPage = div.InnerHtml;