我能以这种方式获得电话号码。
>>> HTML
'<span><a class="click-to-call-link text-gray-light trackMe" href="javascript:;" objid="1236535" compid="clickToCall_profile_directory_sponsored" phone="(617) 981-6551" "="">Click to Call</a></span>'
>>> from lxml import etree
>>> parser = etree.HTMLParser()
>>> tree = etree.fromstring(HTML, parser=parser)
>>> link = tree.xpath('.//a')
>>> link
[<Element a at 0x5a15e08>]
>>> link[0].attrib['phone']
'(617) 981-6551'
您可以使用此代码从整个页面获取电话号码。唯一棘手的部分是xpath
,并记住xpath
将返回一个列表。
>>> import requests
>>> from lxml import etree
>>> page = requests.get('https://www.houzz.com/pro/charlesrose/charles-rose-architects-inc').text
>>> parser = etree.HTMLParser()
>>> tree = etree.fromstring(page, parser=parser)
>>> links = tree.xpath('.//a[@class="click-to-call-link text-gray-light trackMe"]')
>>> links[0].attrib['phone']
'(617) 981-6551'
所以,你有一个xpath失败....发布xpath和错误消息!从这个例子中,我可以成功执行xpath'/ span/a'。 – tdelaney