2015-10-14 80 views
1

这是我的代码。它不会返回任何错误,但它也不会返回任何结果。美丽的汤不会返回结果

import requests 
from bs4 import BeautifulSoup 

googtrends = requests.get("https://www.google.com/trends/") 
soup = BeautifulSoup(googtrends.content) 
links = soup.find_all("a", {"class": "trending-story ng-isolate-scope"}) 

print links 

我还没有解决了这个呢,我开始做某件事情,而不是别的,但我会先用硒尝试和尝试使用硒与任何幻象JS或僵尸JS,如果仍然没有按不工作我会用pytrends,但我只是检查了它们,你需要一个gmail帐户,我有,但我宁愿尝试让它在没有api的情况下工作。

我会回到这里后,当我得到它的工作

+2

为什么不使用['pytrends'](https://github.com/dreyco676/pytrends)google趋势python客户端? – alecxe

+5

趋势报道可能是由JavaScript动态生成的。 BeautifulSoup不运行JavaScript。 – RobertB

+2

@RobertB这就是为什么 – heinst

回答

2

该页面正在被JS渲染dynamically-让我们来尝试甚至改变请求头(失败,同样保证了JS是事业!

测试代码 -

import requests 
from bs4 import BeautifulSoup 


my_headers={"Host": "www.google.com", 
"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0", 
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
"Accept-Language": "en-US,am;q=0.7,zh-HK;q=0.3", 
"Accept-Encoding": "gzip, deflate", 
"Cookie": "PREF=ID=1111111111111111:FF=0:LD=en:TM=1439993585:LM=1444815129:V=1:S=Zjbb3gK_m_n69Hqv; NID=72=F6UyD0Fr18smDLJe1NzTReJn_5pwZz-PtXM4orYW43oRk2D3vjb0Sy6Bs_Do4J_EjeOulugs_x2P1BZneufegpNxzv7rkY9BPHcfdx9vGOHtJqv2r46UuFI2f5nIZ1Cu4RcT9yS5fZ1SUhel5fHTLbyZWhX-yiPXvZCiQoW4FjZd-3Bwxq8yrpdgmPmf4ufvFNlmTd3y; OGP=-5061451:; OGPC=5061713-3:", 
"Connection": "keep-alive"} 


googtrends = requests.get("https://www.google.com/trends/",headers=my_headers) 
my_content = googtrends.text.encode('utf-8') 
soup = BeautifulSoup(my_content,'html.parser') 
links = soup.find_all("a", {"class": "trending-story ng-isolate-scope"},href=True) 

#Lets try if we are getting correct content from the site 
# That site contains "Apple Inc.‬, ‪App Store‬‬" so let's check it in the got response 

print 'Apple Inc.‬, ‪App Store‬‬' in my_content 

# It prints false so website is being rendered by JS even header change does not affect 

所以尽量的webdriver像火狐,Chrome,藩硒tomJS等动态执行JS。更好的尝试API。