2017-05-18 44 views
1

由于雅虎中止他们的API支持大熊猫的DataReader现在失败Python的大熊猫的DataReader不再适用于Yahoo金融改变URL

import pandas_datareader.data as web 
import datetime 
start = datetime.datetime(2016, 1, 1) 
end = datetime.datetime(2017, 5, 17) 
web.DataReader('GOOGL', 'yahoo', start, end) 

HTTPError: HTTP Error 401: Unauthorized 

有任何非官方库让我们暂时解决这个问题呢? Quandl上的任何事情可能?

+2

做一个搜索#1大熊猫为雅虎财经。我很确定这个问题在过去几天里已经被多次询问和回答。 – pshep123

+0

不支持的雅虎财务API被关闭:https://forums.yahoo.net/t5/Yahoo-Finance-help/Is-Yahoo-Finance-API-broken/td-p/250503 –

+0

pshep123,伟大的建议我从来没有认为搜索stackoverflow !!!但像许多其他人知道,雅虎已停止他们的API我没有任何临时解决方案 – Scilear

回答

1

因此,他们已经改变了他们的网址,现在使用cookie保护(也可能是JavaScript),所以我使用dryscrape修复了我自己的问题,这是模拟浏览器 这只是一个供参考,因为这肯定会破坏他们的条款和条件。 ..所以使用你自己的风险?我正在寻找Quandl替代EOD价格来源。

我无法cookie的浏览CookieJar任何地方得到,所以我结束了使用dryscrape为“假”用户下载

import dryscrape 
from bs4 import BeautifulSoup 
import time 
import datetime 
import re 

#we visit the main page to initialise sessions and cookies 
session = dryscrape.Session() 
session.set_attribute('auto_load_images', False) 
session.set_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95  Safari/537.36')  

#call this once as it is slow(er) and then you can do multiple download, though there seems to be a limit after which you have to reinitialise... 
session.visit("https://finance.yahoo.com/quote/AAPL/history?p=AAPL") 
response = session.body() 


#get the dowload link 
soup = BeautifulSoup(response, 'lxml') 
for taga in soup.findAll('a'): 
    if taga.has_attr('download'): 
     url_download = taga['href'] 
print(url_download) 

#now replace the default end date end start date that yahoo provides 
s = "2017-02-18" 
period1 = '%.0f' % time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple()) 
e = "2017-05-18" 
period2 = '%.0f' % time.mktime(datetime.datetime.strptime(e, "%Y-%m-%d").timetuple()) 

#now we replace the period download by our dates, please feel free to improve, I suck at regex 
m = re.search('period1=(.+?)&', url_download) 
if m: 
    to_replace = m.group(m.lastindex) 
    url_download = url_download.replace(to_replace, period1)   
m = re.search('period2=(.+?)&', url_download) 
if m: 
    to_replace = m.group(m.lastindex) 
    url_download = url_download.replace(to_replace, period2) 

#and now viti and get body and you have your csv 
session.visit(url_download) 
csv_data = session.body() 

#and finally if you want to get a dataframe from it 
import sys 
if sys.version_info[0] < 3: 
    from StringIO import StringIO 
else: 
    from io import StringIO 

import pandas as pd 
df = pd.read_csv(StringIO(csv_data), index_col=[0], parse_dates=True) 
df 
4

我通过“修复雅虎金融”在https://pypi.python.org/pypi/fix-yahoo-finance有用找到解决办法,例如:

from pandas_datareader import data as pdr 
import fix_yahoo_finance 

data = pdr.get_data_yahoo('APPL', start='2017-04-23', end='2017-05-24') 

请注意,最后2个数据列的顺序是'Adj Close'和'Volume',即。不是以前的格式。为了重新建立索引:

cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] 
data.reindex(columns=cols) 
+0

我得到一个关于调用get_yahoo_data上的卷列的错误,但谢谢我会考虑它 – Scilear

+0

是@起初我也是Scilear - 尝试重新安装pandas_datareader到最新版本,它应该没问题。 – artDeco

2

我从雅虎变为谷歌财经和它的作品对我来说,所以从

data.DataReader(ticker, 'yahoo', start_date, end_date) 

data.DataReader(ticker, 'google', start_date, end_date) 

和适应我的“老”雅虎从符号:

tickers = ['AAPL','MSFT','GE','IBM','AA','DAL','UAL', 'PEP', 'KO'] 

tickers = ['NASDAQ:AAPL','NASDAQ:MSFT','NYSE:GE','NYSE:IBM','NYSE:AA','NYSE:DAL','NYSE:UAL', 'NYSE:PEP', 'NYSE:KO'] 
0

使线程睡眠的读数之间的每个数据之后。 大部分时间都可以工作,因此请尝试5-6次并将数据保存在csv文件中,这样下次您可以从文件中读取数据。

### code is here ### 
import pandas_datareader as web 
import time 
import datetime as dt 
import pandas as pd 

symbols = ['AAPL', 'MSFT', 'AABA', 'DB', 'GLD'] 
webData = pd.DataFrame() 
for stockSymbol in symbols: 
    webData[stockSymbol] = web.DataReader(stockSymbol, 
    data_source='yahoo',start= 
       startDate, end= endDate, retry_count= 10)['Adj Close'] 
    time.sleep(22) # thread sleep for 22 seconds. 
0

尝试了这一点:

import fix_yahoo_finance as yf 
data = yf.download('SPY', start = '2012-01-01', end='2017-01-01')