打印网页的某些文档元素的所有发生

因此，我正在为用户“Sri”发布的所有“餐馆点评”（而不是自己的评论的自我评论）抓取此特定网页https://www.zomato.com/srijata。打印网页的某些文档元素的所有发生

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata') 
zomato_info = zomato_ind.read() 
open('zomato_info.html', 'w').write(zomato_info) 
soup = BeautifulSoup(open('zomato_info.html')) 
soup.find('div','mtop0 rev-text').text

这将打印了她的第一家餐厅的评论，即 - “斯里兰卡审查大草帽 - 啃这种”为： -

u'Rated&nbsp;&nbsp;This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol..'

我也尝试另一个选择： -

我有这样的问题，： -

如何打印下一家餐厅评论？我试过findNextSiblings等，但都没有看起来工作。

来源

2014-10-01 shalini

为什么保存在一个文件中的HTML然后将该文件读入汤对象？ – 2014-10-01 12:22:02

这是我做的一项措施，以避免连续击中网站，从而遵循安全措施，防止刮擦！ – shalini 2014-10-02 05:41:56

首先，您不需要将输出写入文件，将urlopen()调用的结果传递给BeautifulSoup构造函数。

要获得的评论，您需要遍历所有div标签与rev-text类，并得到了div元素中的.next_sibling：

import urllib2 
from bs4 import BeautifulSoup 

soup = BeautifulSoup(urllib2.urlopen('https://www.zomato.com/srijata')) 
for div in soup.find_all('div', class_='rev-text'): 
    print div.div.next_sibling

打印：

This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol.. 

The ambience is good. The food quality is good. I Didn't find anything to complain. I wanted to visit the place fir a very long time and had dinner today. The meals are very good and if u want the better quality compared to other Andhra restaurants then this is the place. It's far better than nandhana. The staffs are very polite too. 

...

来源

2014-10-01 13:29:59 alecxe

感谢alecxe这个工程，但我仍然试图找出如何？就像为什么你只使用“rev-text”而不是“mtop0 rev-text”？ – shalini 2014-10-01 14:44:19

@shalini我使用过浏览器开发工具，检查了几个评论，发现他们都遵循'rev-text'类模式。那么，肯定有很多方法可以在网页上找到评论。您可以自由选择适合您的任何作品，以及您认为可靠的任何内容。谢谢。 – alecxe 2014-10-01 14:46:58

亚历克斯的问题是，在开发工具class =“mtop0 rev-text”。因此，如果在您的代码中，我将“rev-text”替换为“mtop0 rev-text”，它根本不打印任何内容。根据开发工具“mtop0 rev-text”也应该可以工作,,,,？ – shalini 2014-10-01 14:59:26

你应该做一个for循环和find_all使用，而不是发现：

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata') 
zomato_info = zomato_ind.read() 
open('zomato_info.html', 'w').write(zomato_info) 
soup = BeautifulSoup(open('zomato_info.html')) 
for div in soup.find_all('div','rev-text'): 
    print div.text

另外一个问题：为什么要保存在一个文件中的HTML，然后把文件读入汤对象？

来源

2014-10-01 12:10:36

does not work，print div.text ==> AttributeError：'NavigableString'对象没有属性'text' – shalini 2014-10-01 12:19:49

抱歉试试这个。我忘记将find改成find_all – 2014-10-01 12:21:39

仅在打印第一个评论后停止。 – shalini 2014-10-01 12:23:22

打印网页的某些文档元素的所有发生

回答

相关问题