1
如何从xpath中提取['First one', 'Second two', 'Third']
?用xpath从html中提取列表,并带有换行符
s = """
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<th class="searchResults" style="width:75px">First<br>one</th>
<th class="searchResults" style="width:150px">Second<br>two</th>
<th class="searchResults" style="width:95px">Third<br></th>
</tr>
</tbody></table>
"""
import lxml.html as LH
e = LH.fromstring(s)
e.xpath('/th[@class="searchResults"]/text()')
也分裂在<br>
太,我不想。我试过string()
和normalize-space()
,但不能完全正确。
事先做's = s.replace('
','')是否存在问题? – PaulMcG
它是否必须是仅限xpath的解决方案?在e.xpath('// th [@ class =“searchResults”]')]'中的节点的[''.join(node.itertext())会做到这一点。 –
试试'e.xpath('normalize-space()')。split()' – Andersson