2016-11-09 65 views
0

我想从这里http://rosalind.info/problems/cons/罗莎琳德共识和档案

我的剧本充满计数器列表和输出相同长度的字符串共识解决这个问题。我不认为有数学或指标错误发生,并且遇到了困难。我的代码:

with open('C:/users/steph/downloads/rosalind_cons (3).txt') as f: 
    seqs = f.read().splitlines() 

#remove all objects that are not sequences of interest 
for s in seqs: 
    if s[0] == '>': 
     seqs.remove(s) 

n = range(len(seqs[0])+1) 

#lists to store counts for each nucleotide 
A, C, G, T = [0 for i in n], [0 for i in n], [0 for i in n], [0 for i in n] 

#see what nucleotide is at each index and augment the 
#same index of the respective list 
def counter(Q): 
    for q in Q: 
     for k in range(len(q)): 
      if q[k] == 'A': 
       A[k] += 1 
      elif q[k] == 'C': 
       C[k] += 1 
      elif q[k] == 'G': 
       G[k] += 1 
      elif q[k] == 'T': 
       T[k] += 1 
counter(seqs) 

#find the max of all the counter lists at every index 
#and add the respective nucleotide to the consensus sequence 
def consensus(a,t,c,g): 
     consensus = '' 
     for k in range(len(a)): 
      if (a[k] > t[k]) and (a[k]>c[k]) and (a[k]>g[k]): 
       consensus = consensus+"A" 
      elif (t[k] > a[k]) and (t[k]>c[k]) and (t[k]>g[k]): 
       consensus = consensus+ 'T' 
      elif (c[k] > t[k]) and (c[k]>a[k]) and (c[k]>g[k]): 
       consensus = consensus+ 'C' 
      elif (g[k] > t[k]) and (g[k]>c[k]) and (g[k]>a[k]): 
       consensus = consensus+ 'G' 
      #ensure a nucleotide is added to consensus sequence 
      #when more than one index has the max value 
      else: 
       if max(a[k],c[k],t[k],g[k]) in a: 
        consensus = consensus + 'A' 
       elif max(a[k],c[k],t[k],g[k]) in c: 
        consensus = consensus + 'C' 
       elif max(a[k],c[k],t[k],g[k]) in t: 
        consensus = consensus + 'T' 
       elif max(a[k],c[k],t[k],g[k]) in g: 
        consensus = consensus + 'G' 
     print(consensus) 
     #debugging, ignore this --> print('len(consensus)',len(consensus)) 
consensus(A,T,C,G) 

#debugging, ignore this --> print('len(A)',len(A)) 

print('A: ',*A, sep=' ') 
print('C: ',*C, sep=' ') 
print('G: ',*G, sep=' ') 
print('T: ',*T, sep=' ') 

谢谢您的时间

+1

那么,这是什么问题?你还没有解释什么不起作用 –

回答

0
  • 有以下行错误:

    N =范围(LEN(seqs [0])+ 1)

这导致序列太长(填充额外A和4倍0)。删除+1,它应该工作。

  • 此外,您的输出中有两个空格,请在您的打印语句中删除:之后的空格。
  • 如果你修复了这两行,它将适用于这个例子,但是对于比一行更长的序列将会失败(就像真正的例子)。

尝试合并线条与类似下面的剪断:

new_seqs = list() 
for s in seqs: 
    if s.startswith('>'): 
     new_seqs.append('') 
    else: 
     new_seqs[-1]+=s 
seqs = new_seqs 

,并再次尝试。

+0

这些建议很好,但不幸的是我仍然得到了不正确的答案。浏览Rosalind社区的想法后,我认为这个问题是格式化输出或隐藏的换行符的一些错误。 –

+0

@SankFinatra:你的格式很好,没有隐藏的换行符,我相应地更新了答案。 –

+0

@Maxmilian Peters:我现在看到我错误地构建了我的'seqs'列表。我实施了您的建议更改,但由于某些原因,我仍然得到不正确答案 –