2016-10-11 132 views
4

我有一个列表的列表:如何通过Python中的for循环传递列表列表?

sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']] 
count = [[4,3],[4,2]] 
correctionfactor = [[1.33, 1.5],[1.33,2]] 

我计算每个字符(PI)的频率,将其平方和然后总和(和然后我计算HET = 1 - 总和)。

The desired output [[1,2],[1,2]] #NOTE: This is NOT the real values of expected output. I just need the real values to be in this format. 

问题:我不怎么通过列表(样本,计数)在这个循环中提取所需的值。我以前只通过一个列表(例如['TACT','TTTT'..])使用此代码。

  • 我怀疑我需要添加环路越大,指数在每个元素的样品(即指数超过sample[0] = ['TTTT', 'CCCZ']sample[1] = ['ATTA', 'CZZC']。我不知道如何将其写入代码。

** 代码

list_of_hets = [] 
for idx, element in enumerate(sample): 
    count_dict = {} 
    square_dict = {} 
    for base in list(element): 
     if base in count_dict: 
      count_dict[base] += 1 
     else: 
      count_dict[base] = 1 
    for allele in count_dict: #Calculate frequency of every character 
     square_freq = (count_dict[allele]/count[idx])**2 #Square the frequencies 
     square_dict[allele] = square_freq   
    pf = 0.0 
    for i in square_dict: 
     pf += square_dict[i] # pf --> pi^2 + pj^2...pn^2 #Sum the frequencies 
    het = 1-pf      
    list_of_hets.append(het) 
print list_of_hets 

"Failed" OUTPUT: 
line 70, in <module> 
square_freq = (count_dict[allele]/count[idx])**2 
TypeError: unsupported operand type(s) for /: 'int' and 'list'er 
+1

错误消息告诉您确切** **什么是错的:'square_freq =(count_dict [等位基因] /计数[IDX])** 2'正在引发'TypeError:不支持的操作数类型(s)为/:'int'和'list'。你不能用'list'来划分'int'。顺便说一下,这与您编写的代码不匹配,当您尝试将计数[idx]传递给“float”时,可能会引发另一个“TypeError”。 –

+0

我想使用一个zip命令,如'square_freq = [[n/d for n,d in zip(subq,subr)] for subq,subr in zip(count_dict [allele],counts)]''。但我仍然有错误。还有其他建议吗? – biogeek

+0

@ PM2Ring我已纠正它。感谢您指出 – biogeek

回答

3

我不是你想如何处理你的数据“Z”项目完全清楚,但是这个代码复制为样本数据输出

from __future__ import division 

bases = set('ACGT') 
#sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']] 
sample = [['ATTA', 'TTGA'], ['TTCA', 'TTTA']] 

list_of_hets = [] 
for element in sample: 
    hets = [] 
    for seq in element: 
     count_dict = {} 
     for base in seq: 
      if base in count_dict: 
       count_dict[base] += 1 
      else: 
       count_dict[base] = 1 
     print count_dict 

     #Calculate frequency of every character 
     count = sum(1 for u in seq if u in bases) 
     pf = sum((base/count) ** 2 for base in count_dict.values()) 
     hets.append(1 - pf) 
    list_of_hets.append(hets) 

print list_of_hets 

输出

{'A': 2, 'T': 2} 
{'A': 1, 'T': 2, 'G': 1} 
{'A': 1, 'C': 1, 'T': 2} 
{'A': 1, 'T': 3} 
[[0.5, 0.625], [0.625, 0.375]] 

此代码可以通过使用collections.Counter代替count_dict的进一步简化。

顺便说一句,如果不在'ACGT'中的符号是总是'Z'那么我们可以加快count的计算。摆脱bases = set('ACGT'),改变

count = sum(1 for u in seq if u in bases) 

count = sum(1 for u in seq if u != 'Z') 
+0

我的最终输出必须采用'[[0.5,0.625],[0.625,0。375]]',因为我需要能够区分set1中的第一个元素(['ATTA','TTGA'])与set2 ['TTCA','TTTA'] – biogeek

+0

另外,不要担心“Zs “我已经想出了一种处理它的方法:) – biogeek

+0

@biogeek:这很容易做到。看到我的答案的新版本。 –