从基于其值的json文件中删除数据

我制作了一个脚本来解析来自不同样本的一些blast文件。因为我想知道所有样本都有它的基因，我创建了一个列表和一个字典来计算它们。我也从字典中生成了一个json文件。现在我想删除那些计数小于100的基因，因为这是来自字典或json文件的样本数量，但我不知道如何去做。这是部分代码：忽略else语句，但随后它会给我一个空的字典二）试图打印仅其的那些一）：从基于其值的json文件中删除数据

###to produce a dictionary with the genes, and their repetitions 
for extracted_gene in matches: 
    if extracted_gene in matches_counts: 
     matches_counts[extracted_gene]+=1 
    else: 
     matches_counts[extracted_gene]=1 
print matches_counts #check point 
#if matches_counts[extracted_gene]==100: 
    #print extracted_gene 
#to convert a dictionary into a txt file and format it with json 

with open('my_gene_extraction_trial.txt', 'w') as file: 
    json.dump(matches_counts,file, sort_keys=True, indent=2, separators=(',',':')) 

print 'Parsing has finished'

我曾尝试不同的方法来做到这一点值是100，但它不会打印 c）我阅读了关于json的文档，但我只能看到如何按对象而不是按值删除元素。我可以帮助我解决这个问题吗？这让我很生气！

来源

2017-08-11 Ana

不知道我理解你的问题，但是......如果'x'是一个基因字典，'y'是一个匹配计数字典：'对于基因x：如果y [基因] <100：del x [基因]'。这将从x中移除“基因”条目。您可以创建x的副本，以便在需要时不会从原始字典中删除它们。你将剩下x作为100个或更多匹配基因的字典。 – illiteratecoder

不，我有一个名单，“匹配”，存储的基因，和一个字典，“matches_counts”，存储的基因和他们的计数。我想删除字典中的“额外基因”。 – Ana

制作字典'matches_counts'的副本，我们称之为'copy'; '对于matches_counts中的基因：如果matches_counts [基因] <100：del拷贝[基因]'。现在复制是一个基因字典：匹配，其中匹配> 100.您可以使用'copy.keys（）'遍历基因名称。 – illiteratecoder

这是它应该是什么样子：

# matches (list) and matches_counts (dict) already defined 
for extracted_gene in matches: 
    if extracted_gene in matches_counts: 
     matches_counts[extracted_gene] += 1 
    else: matches_counts[extracted_gene] = 1 

print matches_counts #check point 

# Create a copy of the dict of matches to remove items from 
counts_100 = matches_counts.copy() 

for extracted_gene in matches_counts: 
    if matches_counts[extracted_gene] < 100: 
     del counts_100[extracted_gene] 

print counts_100

让我知道如果你仍然得到错误。

来源

2017-08-11 11:36:50 illiteratecoder

从基于其值的json文件中删除数据

回答

相关问题