我需要使用HtmlAgilityPack和C#解析这个html代码。我可以得到div class =“patent_bibdata”节点,但我不知道如何循环通过子节点。循环遍历由HtmlAgilityPack创建的节点
在这个示例中有6个hrefs,但我需要将它们分成两组;发明人,分类。我对最后两个不感兴趣。这个div中可以有任意数量的hrefs。
正如你所看到的,在两组之前有一段文字说明hrefs是什么。
代码片段
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = m_hw.Load("http://www.google.com/patents/US3748943");
string xpath = "/html/body/table[@id='viewport_table']/tr/td[@id='viewport_td']/div[@class='vertical_module_list_row'][1]/div[@id='overview']/div[@id='overview_v']/table[@id='summarytable']/tr/td/div[@class='patent_bibdata']";
HtmlNode node = m_doc.DocumentNode.SelectSingleNode(xpath);
所以,你会怎么做呢?
<div class="patent_bibdata">
<b>Inventors</b>:
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Ronald T. Lashley
</a>,
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Ronald T. Lashley
</a><br>
<b>Current U.S. Classification</b>:
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://www.uspto.gov/web/patents/classification/uspc084/defs084.htm&usg=AFQjCNEZRFtAyKTfNudgc-XVt2-VboD77Q#C084S31200P">84/312.00P</a>;
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://www.uspto.gov/web/patents/classification/uspc084/defs084.htm&usg=AFQjCNEZRFtAyKTfNudgc-XVt2-VboD77Q#C084S31200R">84/312.00R</a><br>
<br>
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://patft.uspto.gov/netacgi/nph-Parser%3FSect2%3DPTO1%26Sect2%3DHITOFF%26p%3D1%26u%3D/netahtml/PTO/search-bool.html%26r%3D1%26f%3DG%26l%3D50%26d%3DPALL%26RefSrch%3Dyes%26Query%3DPN/3748943&usg=AFQjCNGKUic_9BaMHWdCZtCghtG5SYog-A">
View patent at USPTO</a><br>
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://assignments.uspto.gov/assignments/q%3Fdb%3Dpat%26pat%3D3748943&usg=AFQjCNGbD7fvsJjOib3GgdU1gCXKiVjQsw">
Search USPTO Assignment Database
</a><br>
</div>
通缉的结果 InventorGroup =
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Ronald T. Lashley
</a>
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Thomas R. Lashley
</a>
ClassificationGroup
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://www.uspto.gov/web/patents/classification/uspc084/defs084.htm&usg=AFQjCNEZRFtAyKTfNudgc-XVt2-VboD77Q#C084S31200P">84/312.00P</a>;
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://www.uspto.gov/web/patents/classification/uspc084/defs084.htm&usg=AFQjCNEZRFtAyKTfNudgc-XVt2-VboD77Q#C084S31200R">84/312.00R</a>
我试图刮掉页:http://www.google.com/patents/US3748943
//安德斯
PS!我知道在这个页面中发明人的名字是相同的,但是在大多数人中他们是不同的!
不错!但是,如何获得分类组中的hrefs? – Andis59 2012-08-08 17:01:15