在创建元组时迭代字典

我正在学习python，并试图在地图和元组上工作。我从一个解析的文件创建了一个字典，并在另一个文件中解析。我想通过词典进行迭代，并与来自字典获得的ID替换已解析的文件的每一行的第一个元素在创建元组时迭代字典

我的字典：

for line in blast_lines: 
    (transcript,swissProt,identity) = parse_blast(blast_line=line) 
    transcript_to_protein[transcript] = swissProt

解析该文件中，以及创建一个元组如果该ID

def parse_matrix(matrix_line): 
    matrixFields = matrix_line.rstrip("\n").split("\t") 
    protein = matrixFields[0] 
    if matrixFields[0] in transcript_to_protein: 
      protein = transcript_to_protein.get(transcript) 
      matrixFields[0] = protein 
    return(tuple(matrixFields))

我没有包括所有的在这里我的代码条目存在，因为我相信我的问题一定是我如何通过迭代，将有从字典作为第一个元素的值解析文件和字典，但我会包括一切都在底部。

输入：

爆炸（什么是存储在字典）

c1000_g1_i1|m.799 gi|48474761|sp|O94288.1|NOC3_SCHPO 100.00 747 0 0 5 751 1 747 0.0 1506

此行的成绩单是c1000_g1_i1，瑞士PROT是O94288.1

矩阵（文件是解析）

c3833_g1_i2 4.00 0.07 16.84 26.37

我想取代第一个字段（matrixFi elds [0]），如果第一个字段中的值与字典中的键（transcript）相匹配，则使用swissProt。

我想要的输出看起来像这样

Q09748.1 4.00 0.07 16.84 26.37 
O60164.1 24.55 116.87 220.53 28.82 
C5161_G1_I1 107.49 89.39 26.95 698.97 
P36614.1 27.91 72.57 5.56 36.58 
P37818.1 82.57 19.03 48.55 258.22

但正在此：

O94423.1 4.00 0.07 16.84 26.37 
O94423.1 24.55 116.87 220.53 28.82 
C5161_G1_I1 107.49 89.39 26.95 698.97 
O94423.1 27.91 72.57 5.56 36.58 
O94423.1 82.57 19.03 48.55 258.22

注意如何他们全部的4具有相同的价值，而不是单独的成绩单从字典

Full code：

transcript_to_protein = {}; 

def parse_blast(blast_line="NA"): 
    fields = blast_line.rstrip("\n").split("\t") 
    queryIdString = fields[0] 
    subjectIdString = fields[1] 
    identity = fields[2] 
    queryIds = queryIdString.split("|") 
    subjectIds = subjectIdString.split("|") 
    transcript = queryIds[0].upper() 
    swissProt = subjectIds[3] 
    base = swissProt.split(".")[0] 
    return(transcript, swissProt, identity) 

blast_output = open("/scratch/RNASeq/blastp.outfmt6") 
blast_lines = blast_output.readlines() 

for line in blast_lines: 
    (transcript,swissProt,identity) = parse_blast(blast_line=line) 
    transcript_to_protein[transcript] = swissProt 

def parse_matrix(matrix_line): 
    matrixFields = matrix_line.rstrip("\n").split("\t") 
    matrixFields[0] = matrixFields[0].upper() 
    protein = matrixFields[0] 
    if matrixFields[0] in transcript_to_protein: 
      protein = transcript_to_protein.get(transcript) 
      matrixFields[0] = protein 
    return(tuple(matrixFields)) 

def tuple_to_tab_sep(one_tuple): 
    tab = "\t" 
    return tab.join(one_tuple) 

matrix = open("/scratch/RNASeq/diffExpr.P1e-3_C2.matrix") 

newline = "\n" 

list_of_de_tuples = map(parse_matrix,matrix.readlines()) 

list_of_tab_sep_lines = map(tuple_to_tab_sep, list_of_de_tuples) 
print(newline.join(list_of_tab_sep_lines))

来源

2016-12-16 Jamie Leigh

首先在parse_blast()中有一个错误 - 它不返回元组(transcript,swissProt,identity)，而是返回(transcript,base,identity)而base不包含缺少的信息。

更新

其次，这里还有在parse_matrix()的错误。从文件中读取的第一个字段没有丢失的信息，但是，这是matrixFields[0]位于transcript_to_protein字典中时返回的元组中的内容。

只是修复一个不会自己解决问题。

来源

2016-12-16 18:42:15 martineau

随着该修正它仍然打印所有更换领域，而不是通过字典迭代相同的值需要，而不是打印过的最后一个值它们全部 –

看来问题可能出现在parseblast函数中。对于线

c1000_g1_i1|m.799 gi|48474761|sp|O94288.1|NOC3_SCHPO 100.00 747 0 0 5 751 1 747 0.0 1506 

subjectIdString = fields[1]

所以subjectIdString将是GI | 48474761 | SP | O94288。1 | NOC3_SCHPO

然后

swissProt = subjectIds[3]

SWISSPROT将O94288.1，其中所述函数进一步拆分，使用。在线

base = swissProt.split(".")[0]

最终的结果将是，SWISSPROT将是094288，而不是| O94288.1，这似乎你期待。我会建议测试单线输入功能，直到你得到所需的输出

来源

2016-12-16 18:47:18

它的对单行工作正常，问题在于它只是为所有行打印相同的swissprot id，而不是与字典中的键匹配 –

错误是在我的字典调用，因为我想匹配matrixFields [0]与从字典中的脚本，我试图搜索字典使用if matrixFields[0] in transcript_to_protein:而是分配领域

trasncript = matrixfields[0] 
if transcript in transcript_to_protein: 
     protein = transcript_to_protein.get(transcript)

来源

2016-12-16 19:07:38

在创建元组时迭代字典

回答

相关问题