2017-02-22 69 views
0

我想从纽约时报(纽约时报)文章中获取一些数据,当我执行下面的代码时,它给了我一个我不熟悉的错误,我搜索了在谷歌和通过以前的答案从stackoverflow,但不明白我的问题。 任何人都可以请告诉我如何解决我的错误。 在此先感谢!python code error(linux,web scrapping)

的代码:

from nytimesarticle import articleAPI 
api = articleAPI('a0de895aa110431eb2344303c7105a9f') 

articles = api.search(q = 'Obama', 
    fq = {'headline':'Obama', 'source':['Reuters','AP', 'The New York Times']}, 
    begin_date = 20111231) 

def parse_articles(articles): 
    ''' 
    This function takes in a response to the NYT api and parses 
    the articles into a list of dictionaries 
    ''' 
    news = [] 
    for i in articles['response']['docs']: 
     dic = {} 
     dic['id'] = i['_id'] 
     if i['abstract'] is not None: 
      dic['abstract'] = i['abstract'].encode("utf8") 
     dic['headline'] = i['headline']['main'].encode("utf8") 
     dic['desk'] = i['news_desk'] 
     dic['date'] = i['pub_date'][0:10] # cutting time of day. 
     dic['section'] = i['section_name'] 
     if i['snippet'] is not None: 
      dic['snippet'] = i['snippet'].encode("utf8") 
     dic['source'] = i['source'] 
     dic['type'] = i['type_of_material'] 
     dic['url'] = i['web_url'] 
     dic['word_count'] = i['word_count'] 
     # locations 
     locations = [] 
     for x in range(0,len(i['keywords'])): 
      if 'glocations' in i['keywords'][x]['name']: 
       locations.append(i['keywords'][x]['value']) 
     dic['locations'] = locations 
     # subject 
     subjects = [] 
     for x in range(0,len(i['keywords'])): 
      if 'subject' in i['keywords'][x]['name']: 
       subjects.append(i['keywords'][x]['value']) 
     dic['subjects'] = subjects 
     news.append(dic) 
    return(news) 

def get_articles(date,query): 
    ''' 
    This function accepts a year in string format (e.g.'1980') 
    and a query (e.g.'Amnesty International') and it will 
    return a list of parsed articles (in dictionaries) 
    for that year. 
    ''' 
    all_articles = [] 
    for i in range(0,100): #NYT limits pager to first 100 pages. But rarely will you find over 100 pages of results anyway. 
     articles = api.search(q = query, 
       fq = {'source':['Reuters','AP', 'The New York Times']}, 
       begin_date = date + '0101', 
       end_date = date + '1231', 
       sort='oldest', 
       page = str(i)) 
     articles = parse_articles(articles) 
     all_articles = all_articles + articles 
    return(all_articles) 

Amnesty_all = [] 
for i in range(1980,2014): 
    print 'Processing' + str(i) + '...' 
    Amnesty_year = get_articles(str(i),'Amnesty International') 
    Amnesty_all = Amnesty_all + Amnesty_year 

import csv 
keys = Amnesty_all[0].keys() 
with open('amnesty-mentions.csv', 'wb') as output_file: 
    dict_writer = csv.DictWriter(output_file, keys) 
    dict_writer.writeheader() 
    dict_writer.writerows(Amnesty_all) 

这是在终端上运行时生成的错误:

[email protected]:~$ cd Desktop 
[email protected]:~/Desktop$ python nyt.py 
Processing1980... 
Traceback (most recent call last): 
    File "nyt.py", line 66, in <module> 
    Amnesty_year = get_articles(str(i),'Amnesty International') 
    File "nyt.py", line 59, in get_articles 
    articles = parse_articles(articles) 
    File "nyt.py", line 14, in parse_articles 
    for i in articles['response']['docs']: 
KeyError: 'response' 
[email protected]:~/Desktop$ python nyt.py 
Processing1980... 
Traceback (most recent call last): 
    File "nyt.py", line 66, in <module> 
    Amnesty_year = get_articles(str(i),'Amnesty International') 
    File "nyt.py", line 59, in get_articles 
    articles = parse_articles(articles) 
    File "nyt.py", line 14, in parse_articles 
    for i in articles['response']['docs']: 
KeyError: 'response' 

回答

0

api.search返回不期望的结果。其代码:

r = requests.get(url) 
    return r.json() 

所以只有当API“http://api.nytimes.com/svc/search/v2/articlesearch”返回正确的反应和响应具有正确的JSON的身体,你可以正确地得到您的代码运行。

异常是KeyError,所以返回对象是字典像。您可能要检查:

In [8]: print articles.keys() 
Out[8]: [u'status', u'response', u'copyright'] 

和:

In [9]: print articles['status'] 
Out[9]: u'OK' 

如果不是这样,我想NYT API可以不填响应时的文章[ '状态'] = 'OK',你可能需要!处理这种意外状态并重试。

+0

谢谢!我会解决我的错误:) –