刮页面完成

我刮：刮页面完成

http://www.wotif.com/hotel/View?hotel=W3830&page=1&adults=2&startDay=2014-11-08&region=1&descriptionSearch=true#property-reviews

使用下面的代码：

hotel_page = requests.get(hotel_url).text 
hotel_page_soup = BeautifulSoup(hotel_page)

但是，这不包括Guest Review部分，原因是它是装在加载页面后通过AJAX调用。

问题：如何在完成所有AJAX调用后才能刮页面？

来源

2014-11-08 Umair

是你能解决这个问题？我有同样的问题 – user1050619 2015-12-01 17:08:30

你需要调用这个URL以及确保X-Requested-With是XMLHttpRequest

URL="http://www.wotif.com/review/fragment?propertyId=W3830&limit=5" 

headers={"X-Requested-With":"XMLHttpRequest", 
"User-Agent":"Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"} 

r=requests.get(URL,headers=headers) 

#response here will be in json format 
#Page source can be extracted using key `html'` 
response=r.json()['html'] 
soup=BeautifulSoup(response) 
reviews=soup.find(class_="review-score review-score-large").text 
print reviews 

Out[]:u'\n\n4.4\nOut of 5\n\n\n' 

print reviews.strip() 

Out[]:u'4.4\nOut of 5'

来源

2014-11-09 12:53:01

但是，这将只加载每次'5'客人评论...而且我将不得不把这个在循环得到所有的评论 – Umair 2014-11-09 13:03:25

@Umair那么这有什么问题？你也是这样做的。我回答了你的问题 - 如果你需要更多，请相应地改变你的问题，我会回答它 – 2014-11-09 21:42:04

你刮了错误的URL ...在我的问题中看到...我想刮了一个后，它已经AJAX – Umair 2014-11-10 04:52:30

-1

这很简单。如果您请求URL http://www.wotif.com/review/fragment.json?propertyId=W3830&limit=100&bestThing=True，您将获得json格式的所有评论。

URL http://www.wotif.com/review/fragment?propertyId=W3830&limit=100&为您提供嵌入json中的html的评论。你必须看看自己，最适合你的需求。

来源

2014-11-08 19:14:26 Daniel

如果您访问 'http：//www.wotif.com/review/fragment.json？propertyId = W3830＆limit = 100＆bestThing = True' 您将会看到评论日期和评论分数。 ..但在JSON响应..他们不可用... – Umair 2014-11-09 06:23:33

回答

相关问题