2017-10-17 106 views
3

我有多个JSON文件,其中包含大写和国家。如何从所有文件中删除重复的键值对?如何从多个JSON文件中删除重复内容?

我有以下的JSON文件之一

{ 
    "data": [ 
    { 
     "Capital": "Berlin", 
     "Country": "Germany" 
    }, 
    { 
     "Capital": "New Delhi", 
     "Country": "India" 
    }, 
    { 
     "Capital": "Canberra", 
     "Country": "Australia" 
    }, 
    { 
     "Capital": "Beijing.", 
     "Country": "China" 
    }, 
    { 
     "Capital": "Tokyo", 
     "Country": "Japan" 
    }, 
    { 
     "Capital": "Tokyo", 
     "Country": "Japan" 
    }, 
    { 
     "Capital": "Berlin", 
     "Country": "Germany" 
    }, 
    { 
     "Capital": "Moscow", 
     "Country": "Russia" 
    }, 
    { 
     "Capital": "New Delhi", 
     "Country": "India" 
    }, 
    { 
     "Capital": "Ottawa", 
     "Country": "Canada" 
    } 
    ] 

} 

有包含许多这样的JSON文件重复items.How做我删除repetitve项目只保留第一次出现?我已经试过这一点,但没有按不工作

dupes = [] 
for f in json_files: 
    with open(f) as json_data: 
     nations = json.load(json_data)['data'] 
     #takes care of duplicates and stores it in dupes 
     dupes.append(x for x in nations if x['Capital'] in seen or seen.add(x['Capital'])) 
     nations = [x for x in nations if x not in dupes] #want to keep the first occurance of the item present in dupes 

    with open(f, 'w') as json_data: 
     json.dump({'data': nations}, json_data) 

回答

1

列表解析是伟大的!但是......当这个过程涉及到一个if声明时,他们可能会使代码复杂化。

这绝不是的经验法则。相反,我鼓励你经常使用列表解析。在这种特殊情况下,更多的解决方案更具可读性。

我的建议是这样的:

import json 

seen = [] 
result = [] 

with open('data.json') as json_data: 
    nations = json.load(json_data)['data'] 
    #takes care of duplicates and stores it in dupes 
    for item in nations: 
     if item['Capital'] not in seen: 
      seen.append(item['Capital']) 
      result.append(item) 

with open('data.no_dup.json', 'w') as json_data: 
    json.dump({'data': result}, json_data) 

测试和工程上的Python 3.5.2。

请注意,为了方便起见,我已经移除了您的外部循环。

+0

您的代码适合我希望实现的功能。谢谢! –

0

以下是你如何能做到这一点了给定的JSON示例代码

import json 

files = ['countries.json'] 

for f in files: 
    with open(f,'r') as fp: 
     nations = json.load(fp) 
    result = [dict(tupleized) for tupleized in set(tuple(item.items())\ 
      for item in nations['data'])] 
print result 
print len(result) 

输出:

[{u'Country': u'Russia', u'Capital': u'Moscow'}, {u'Country': u'Japan', u'Capital': u'Tokyo'}, {u'Country': u'Canada', u'Capital': u'Ottawa'}, {u'Country': u'India', u'Capital': u'New Delhi'}, {u'Country': u'Germany', u'Capital': u'Berlin'}, {u'Country': u'Australia', u'Capital': u'Canberra'}, {u'Country': u'China', u'Capital': u'Beijing.'}] 
7 
+0

请注意,这只会筛选出重复对,所以'{'国家':'俄罗斯','资本':'莫斯科'}和'{'国家':'扎伊尔','资本':'莫斯科'} '都将在'结果' – jpyams

2

你可能不能使用清凉列表理解,但经常循环应工作

used_nations = {} 
for nation in nations: 
    if nation['Capital'] in used_nations: 
     nations.remove(nation) 
    else: 
     used_nations.add(nation['Capital']) 
+0

这不是JS,'nation.country'不起作用。 – nutmeg64

+0

@ nutmeg64我相信有人会不久之后创建一个'python.js';) – jpyams