2017-10-13 110 views
-1

在Python 3.6中,我有一个像下面这样的列表,并且无法弄清楚如何正确搜索这些值。所以,如果我给了下面的搜索字符串,我需要搜索标题和标签的值以及哪个匹配最多的值,我会返回id,如果有相同数量的许多不同图像(id)的比赛,那么标题首先按字母顺序排列的人将被退回。另外,它应该不是区分大小写的。所以在代码中,我有搜索作为我的术语来搜索,它应该返回第一个id值,而是返回不同的值。如何用Python中的字典搜索嵌套列表?

image_info = [ 
{ 
    "id" : "34694102243_3370955cf9_z", 
    "title" : "Eastern", 
    "flickr_user" : "Sean Davis", 
    "tags" : ["Los Angeles", "California", "building"] 
}, 
{ 
    "id" : "37198655640_b64940bd52_z", 
    "title" : "Spreetunnel", 
    "flickr_user" : "Jens-Olaf Walter", 
    "tags" : ["Berlin", "Germany", "tunnel", "ceiling"] 
}, 
{ 
    "id" : "34944112220_de5c2684e7_z", 
    "title" : "View from our rental", 
    "flickr_user" : "Doug Finney", 
    "tags" : ["Mexico", "ocean", "beach", "palm"] 
}, 
{ 
    "id" : "36140096743_df8ef41874_z", 
    "title" : "Someday", 
    "flickr_user" : "Thomas Hawk", 
    "tags" : ["Los Angeles", "Hollywood", "California", "Volkswagen", "Beatle", "car"] 
} 

]

my_counter = 0 
search = "CAT IN BUILding" 
search = search.lower().split() 
matches = {} 

for image in image_info: 
    for word in search: 
     word = word.lower() 
     if word in image["title"].lower().split(" "): 
      my_counter += 1 
      print(my_counter) 
     if word in image["tags"]: 
      my_counter +=1 
      print(my_counter) 
    if my_counter > 0: 
     matches[image["id"]] = my_counter 
     my_counter = 0 
+0

什么,当你说“返回”你的意思是?你没有返回任何东西?你的预期产出是什么,它与你拥有的产品有什么不同?你能更明确吗? –

+0

我运行了你的代码,它给了我匹配词典中的第一个ID。但是,标签存在一个错误。您将搜索字符串中的单词缩写为小写,而不是标记中的单词,但标记包含一些大写的单词。例如,你将无法匹配洛杉矶。 – bouma

+0

@ juanpa.arrivillaga因此,我使用搜索项“CAT IN BUILTING”来搜索列表/字典中的标题和标记的值,并且我希望函数返回找到的匹配项。因此,对于“CAT IN BUILTING”,它应该返回1,并在34694102243_3370955cf9_z找到匹配的ID。如果搜索词是“在墨西哥海滩建造”,那么它应该返回34944112220_de5c2684e7_z,因为它在标签中有2个匹配项。 – Gray

回答

0

这是一种代码的变体,我试图在搜索前预先对数据进行索引。这是一个非常基本的实现如何CloudSearchElasticSearch会索引和搜索

import itertools 
from collections import Counter 
image_info = [ 
{ 
    "id" : "34694102243_3370955cf9_z", 
    "title" : "Eastern", 
    "flickr_user" : "Sean Davis", 
    "tags" : ["Los Angeles", "California", "building"] 
}, 
{ 
    "id" : "37198655640_b64940bd52_z", 
    "title" : "Spreetunnel", 
    "flickr_user" : "Jens-Olaf Walter", 
    "tags" : ["Berlin", "Germany", "tunnel", "ceiling"] 
}, 
{ 
    "id" : "34944112220_de5c2684e7_z", 
    "title" : "View from our rental", 
    "flickr_user" : "Doug Finney", 
    "tags" : ["Mexico", "ocean", "beach", "palm"] 
}, 
{ 
    "id" : "36140096743_df8ef41874_z", 
    "title" : "Someday", 
    "flickr_user" : "Thomas Hawk", 
    "tags" : ["Los Angeles", "Hollywood", "California", "Volkswagen", "Beatle", "car"] 
} 
] 

my_counter = 0 
search = "CAT IN BUILding california" 
search = set(search.lower().split()) 
matches = {} 

index = {} 


# Building a rudimentary search index 
for info in image_info: 
    bag = info["title"].lower().split(" ") 
    tags = [t.lower().split(" ") for t in info["tags"]] # we want to be able to hit "los angeles" as will as "los" and "angeles" 
    tags = list(itertools.chain.from_iterable(tags)) 
    for k in (bag + tags): 
     if k in index: 
      index[k].append(info["id"]) 
     else: 
      index[k] = [info["id"]] 

#print(index) 

hits = [] 

for s in search: 
    if s in index: 
     hits += index[s] 
print(Counter(hits).most_common(1)[0][0]) 
+0

如果我尝试运行你提供的代码,我得到错误:TypeError:append()只需要一个参数(给定3)。 – Gray

+0

谢谢@Mahi。我已更改代码来解决问题。 – djinn

+0

谢谢,这工作。但是,我有一个问题。现在它正在输出所有图像id和它的命中数量,但是如何才能打印出只有最大命中数量的图像id而不是所有命中的图像ID? – Gray

0

您正在创建词典匹配新条目[图片[ “ID”] = my_counter。 如果您想在该字典中只保留1个条目,并且您希望image_id和count。我修改了你的字典和条件。希望能帮助到你。

my_counter = 0 
search_term = "CAT IN BUILding" 
search = search_term.lower().split() 
matches = {} 
matches[search_term] = {} 

for image in image_info: 
    for word in search: 
     word = word.lower() 
     if word in image["title"].lower().split(" "): 
      my_counter += 1 
      print(my_counter) 
     if word in image["tags"]: 
      my_counter +=1 
      print(my_counter) 
    if my_counter > 0: 
     if not matches[search_term].values() or my_counter > matches[search_term].values()[0]: 
      matches[search_term][image["id"]] = my_counter 

     my_counter = 0 
+0

我试着运行你修改过的代码,现在得到错误:TypeError:' dict_values的对象不支持索引 – Gray

+0

Python 3.4在执行dict.values()时返回dict_values()而不是列表。只需将list()放在匹配[search_term] .values()周围。它应该像列表一样(匹配[search_term] .values())[0] –

+0

也可以使用小写列表标记,如上面的一个用户突出显示的那样。 –