2017-02-16 72 views
0

对于返回关键字列表的每篇文章。我们希望使用键 - >值将所有单词连接到列表中,如下所示。在我追加之前,我想从列表中删除'u'。然后我们想比较两个列表中的多少个常用单词并返回结果。加入列表中的关键字NYT

实施例列出了从dic['keywords']返回:

第一个返回:

[ 
    { 
    u'value': u'Dunford, Joseph F Jr', 
    u'name': u'persons', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Afghanistan', 
    u'name': u'glocations', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Afghan National Police', 
    u'name': u'organizations', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Afghanistan War (2001-)', 
    u'name': u'subject', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Defense and Military Forces', 
    u'name': u'subject', 
    u'rank': u'2' 
    } 
] 

第2返回:

[ 
    { 
    u'value': u'Gall, Carlotta', 
    u'name': u'persons', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Gannon, Kathy', 
    u'name': u'persons', 
    u'rank': u'2' 
    }, 
    { 
    u'value': u'Niedringhaus, Anja (1965-2014)', 
    u'name': u'persons', 
    u'rank': u'3' 
    }, 
    { 
    u'value': u'Kabul (Afghanistan)', 
    u'name': u'glocations', 
    u'rank': u'2' 
    }, 
    { 
    u'value': u'Afghanistan', 
    u'name': u'glocations', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Afghan National Police', 
    u'name': u'organizations', 
    u'rank': u'1' 
    }, 
    { 
    u'value': u'Afghanistan War (2001-)', 
    u'name': u'subject', 
    u'rank': u'1' 
    } 
] 

所需的输出:

List1 = ['Dunford, Joseph F Jr',’ Afghanistan’, ‘Afghan National Police’, ‘: Afghanistan War (2001-)’, ‘Defense and Military Forces’] 
List2 = [‘Gall, Carlotta'’,’ u'Gannon, Kathy',’ Niedringhaus, Anja (1965-2014)’,’Afghanistan’] 

关键词共同点:2

我的代码如下:

from flask import Flask, render_template, request, session, g, redirect, url_for 
    from nytimesarticle import articleAPI 

    api = articleAPI('X') 

articles = api.search(q = 'Afghan War', 
fq = {'headline':'', 'source':['Reuters','AP', 'The New York Times']}, 
begin_date = 20111231) 

def parse_articles(articles): 
''' 
This function takes in a response to the NYT api and parses 
the articles into a list of dictionaries 
''' 
news = [] 
for i in articles['response']['docs']: 
    dic = {} 
    dic['id'] = i['_id'] 
    if i['abstract'] is not None: 
     dic['abstract'] = i['abstract'].encode("utf8") 
    dic['headline'] = i['headline']['main'].encode("utf8") 
    dic['desk'] = i['news_desk'] 
    dic['date'] = i['pub_date'][0:10] # cutting time of day. 
    dic['section'] = i['section_name'] 
    dic['keywords'] = i['keywords'] 
    print dic['keywords'] 
    if i['snippet'] is not None: 
     dic['snippet'] = i['snippet'].encode("utf8") 
    dic['source'] = i['source'] 
    dic['type'] = i['type_of_material'] 
    dic['url'] = i['web_url'] 
    dic['word_count'] = i['word_count'] 
    # locations 
    locations = [] 
    for x in range(0,len(i['keywords'])): 
     if 'glocations' in i['keywords'][x]['name']: 
      locations.append(i['keywords'][x]['value']) 
    dic['locations'] = locations 
    # subject 
    subjects = [] 
    for x in range(0,len(i['keywords'])): 
     if 'subject' in i['keywords'][x]['name']: 
      subjects.append(i['keywords'][x]['value']) 
    dic['subjects'] = subjects 
    news.append(dic) 
return(news) 

print(parse_articles(articles)) 

回答

0

您可以使用列表中理解从给定的字典创建列表。

d = [{u'value': u'Dunford, Joseph F Jr', u'name': u'persons', u'rank': u'1'}, {u'value': u'Afghanistan', u'name': u'glocations', u'rank': u'1'}, {u'value': u'Afghan National Police', u'name': u'organizations', u'rank': u'1'}, {u'value': u'Afghanistan War (2001-)', u'name': u'subject', u'rank': u'1'}, {u'value': u'Defense and Military Forces', u'name': u'subject', u'rank': u'2'}] 
print [v['value'] for v in d] # prints [u'Dunford, Joseph F Jr', u'Afghanistan', u'Afghan National Police', u'Afghanistan War (2001-)', u'Defense and Military Forces']