2017-06-02 99 views
0

我想刮掉谷歌财经上列出的公司名称,网址和描述。到目前为止,我成功地获取了描述和url,但无法获取名称。在myUrl的源代码中,name是024 Pharma Inc.。当我看到div时,这个类被命名为'appbar-snippet-primary'。但是代码仍然没有找到它。我马上新到网页抓取,所以可能是我失去了一些东西。请在这方面指导我。无法从谷歌金融刮名称

from bs4 import BeautifulSoup 
import urllib 
import csv 

myUrl = 'https://www.google.com/finance?q=OTCMKTS%3AEEIG' 

r = urllib.urlopen(myUrl).read() 
soup = BeautifulSoup(r, 'html.parser') 

name_box = soup.find('div', class_='appbar-snippet-primary') # !! This div is not found 
#name = name_box.text 
#print name 

description = soup.find('div', class_='companySummary') 
desc = description.text.strip() 
#print desc 

website = soup.find('div', class_='item') 
site = website.text 
#print site 
+0

https://stackoverflow.com/questions/5913280/beautifulsoup-and-ajax-table-problem并且还HTTPS ://pypi.python.org/pypi/googlefinance –

+0

由于这个div是在javascript中动态生成的,因此您无法找到div'appbar-snippet-primary',您需要'selenium'或'splash'来取消这种网页。 –

回答

0
from bs4 import BeautifulSoup 
import requests 

myUrl = 'https://www.google.com/finance?q=OTCMKTS%3AEEIG' 

r = requests.get(myUrl).content 
soup = BeautifulSoup(r, 'html.parser') 

name = soup.find('title').text.split(':')[0] # !! This div is not found 
#print name 

description = soup.find('div', class_='companySummary') 
desc = description.text.strip() 
#print desc 

website = soup.find('div', class_='item') 
site = website.text 
-1

写入soup.find_all()而不是soup.find()