2013-07-31 29 views
0

下面的兄弟标签我有一个HTML文件中像这样:如何使用XPath选择

<div id="note"> 
<a name="overview"></a> 
<h3>Overview</h3> 
<p>some text1...</p> 
<a name="description"></a> 
<h3>Description</h3> 
<p>some text2 ...</p> 
</div> 
           ` 

我想找回段落,每个标题。 例如,overview:some text1 description:some text 2 ... 我想用python在xpath中编写这个。 谢谢。

回答

0

找到所有h3标签,在它们之间迭代,并在迭代循环的每一个步骤,找到一个兄弟标签p

import urllib2 
from lxml import etree 

URL = "http://www.kb.cert.org/vuls/id/628463" 
response = urllib2.urlopen(URL) 

parser = etree.HTMLParser() 
tree = etree.parse(response, parser) 

for header in tree.iter('h3'): 
    paragraph = header.xpath('(.//following-sibling::p)[1]') 
    if paragraph: 
     print "%s: %s" % (header.text, paragraph[0].text) 

打印:

Overview: The Ruby on Rails 3.0 and 2.3 JSON parser contain a vulnerability that may result in arbitrary code execution. 
Description: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
Impact: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
Solution: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
Vendor Information : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
CVSS Metrics : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
References: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
Credit: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability. 
Feedback: If you have feedback, comments, or additional information about this vulnerability, please send us 
Subscribe to Updates: Receive security alerts, tips, and other updates. 
+0

感谢您的回复。 我得到这个错误: lxml.etree.XMLSyntaxError:从LXML进口etree 进口的urllib 从StringIO的进口StringIO的 :打开和结束标记不匹配:链接,此行代码1和头部,1号线,列485 url ='http://www.kb.cert.org/vuls/id/628463' text = urllib.urlopen(url).read() f = StringIO(text) tree = etree.parse(f ) headers = tree.xpath('// h3') for header in header: paragraph = header.xpath('(.// following-sibling :: p)[1]')[0] print “%s:%s”%(header.text,paragraph.text) p.s.我是新的python和xpath。 – Gomeisa

+0

@Golbarghajian我更新了代码,请检查。 – alecxe

+0

它的工作,谢谢sooo多。 – Gomeisa