如果你有requests
来解决这个问题,就需要模仿并当您单击什么浏览器“加载更多”按钮 - 它发送一个XHR请求到http://www.goudengids.be/q/ajax/business/results.json
端点,模拟它在你的代码维护网页 - 采访会。该XHR响应以JSON格式 - 在这种情况下没有必要BeautifulSoup
可言:
import requests
main_url = "http://www.goudengids.be/qn/business/advanced/where/Provincie%20Antwerpen/what/restaurant/"
xhr_url = "http://www.goudengids.be/q/ajax/business/results.json"
with requests.Session() as session:
session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}
# visit main URL
session.get(main_url)
# load more listings - follow the pagination
page = 1
listings = []
while True:
params = {
"input": "restaurant Provincie Antwerpen",
"what": "restaurant",
"where": "Provincie Antwerpen",
"type": "DOUBLE",
"resultlisttype": "A_AND_B",
"page": str(page),
"offset": "2",
"excludelistingids": "nl_BE_YP_FREE_11336647_0000_1746702_6165_20130000, nl_BE_YP_PAID_11336647_0000_1746702_7575_20139729427, nl_BE_YP_PAID_720348_0000_187688_7575_20139392980",
"context": "SRP * A_LIST"
}
response = requests.get(xhr_url, params=params, headers={
"X-Requested-With": "XMLHttpRequest",
"Referer": main_url
})
data = response.json()
# collect listing names in a list (for example purposes)
listings.extend([item["bn"] for item in data["overallResult"]["searchResults"]])
page += 1
# TODO: figure out exit condition for the while True loop
print(listings)
我留下了重要的TODO你 - 找出退出条件 - 当停止收集物品。
当我运行脚本它给了我一个错误信息回溯(最近通话最后一个): 文件“C:\用户\用户\桌面\ python的\脚本\ 3url.py”,3号线,在 与requests.Session()作为会话: NameError:name'requests'未定义我该如何解决? –
vishnu
@vishnu se这个'import requests'在最上面?这个很重要。你必须安装'request'模块。 – alecxe
你是对的@alecxe我真的忘了。谢谢你的大力帮助,以后我还需要你 – vishnu