从Xpath查询获取属性和文本作为列表

我想查询一个html字符串，并将超链接中的href属性和文本节点提取到列表（或任何其他字典）中。从Xpath查询获取属性和文本作为列表

考虑下面的代码：

from lxml import html 
str = '<a href="href1"> Text1 </a>' \ 
     '<a href="href2"> Text2 </a>' \ 
     '<a href="href3"> Text3 </a>' 
tree = html.fromstring(str) 
items = tree.xpath('//a') 

values = list() 
for item in items: 
    text = item.text 
    href = item.get('href') 
    values.append((text, href)) 

for text, href in values: 
    print text, href

这工作！

我想知道是否可以省略for item in items:循环，并仅通过XPath查询获取values列表。

tree.xpath('//a/text()')和tree.xpath('//a/@href')给我一个 - 但我希望两个值在列表中。

来源

2014-09-13 madflow

您可以使用|建立一个复合的XPath。文本和hrefs都将返回到一个列表中，items。您可以使用grouper recipe,zip(*[iterable]*2)配对每两个项目。（但是请注意，这依赖于HREF中和文本字符串交替）：

from lxml import html 
str = '<a href="href1"> Text1 </a>' \ 
     '<a href="href2"> Text2 </a>' \ 
     '<a href="href3"> Text3 </a>' 
tree = html.fromstring(str) 
items = tree.xpath('//a/text() | //a/@href') 

for href, text in zip(*[iter(items)]*2): 
    print text, href

产生

Text1 href1 
Text2 href2 
Text3 href3

来源

2014-09-13 18:42:48 unutbu

我喜欢的Python :) – madflow 2014-09-13 18:57:21

您可以使用zip：

a = [1, 2, 3] 
b = ['a', 'b', 'c'] 
zip(a, b) # [(1, 'a'), (2, 'b'), (3, 'c')]

所以要根据您的XPath表达式：

texts = tree.xpath('//a/text()') 
hrefs = tree.xpath('//a/@href') 
values = zip(texts, hrefs)

来源

2014-09-13 18:37:31

从Xpath查询获取属性和文本作为列表

回答

相关问题