2017-09-16 64 views
0

我正在使用for循环搜索NCBI蛋白质数据库中的蛋白质ID列表,并尝试将这些ID转换为描述。这里有一个例子:如何将多个字符串放入for循环的列表中?

import pandas as pd 
from Bio import Entrez 
from Bio import SeqIO 

df2=pd.read_csv('ID.txt', header=None) 
df.columns = ['protein_ID'] # put a header 'protein_ID' to the dataframe 
lists=df.protein_ID.tolist() #convert the column into a list of protein IDs. 

description = '' 
for num, line in enumerate(lists): 
    handle = Entrez.efetch(db="protein", id=line, rettype="gb", retmode="text") 
    record = SeqIO.read(handle, "genbank") 
    description += record.description 

description 

它返回一个巨大的字符串:

'hypothetical protein UR61_C0009G0014 [candidate division WS6 bacterium GW2011_GWE1_34_7]ATPase [candidate division WS6 bacterium GW2011_GWE2_33_157]hypothetical protein UR96_C0034G0007 [candidate division WS6 bacterium GW2011_GWC1_36_11]phosphoenolpyruvate synthase [Candidatus Komeilibacteria bacterium RIFOXYC1_FULL_37_11]' 

我要的是新换行的字符串列表,像这样:

[ 
'hypothetical protein UR61_C0009G0014 [candidate division WS6 bacterium GW2011_GWE1_34_7]', 
'ATPase [candidate division WS6 bacterium GW2011_GWE2_33_157]', 
'hypothetical protein UR96_C0034G0007 [candidate division WS6 bacterium GW2011_GWC1_36_11]', 
'phosphoenolpyruvate synthase [Candidatus Komeilibacteria bacterium RIFOXYC1_FULL_37_11]' 
] 

如何实现这个?非常感谢你!

+1

Ma ke'description'列表 - 'description = []' - 并且执行'description.append(record.description)'。 –

+0

噢,是的,谢谢,那简单! – stevex

回答

0

我要的是一个字符串列表

description = [] 
for num, line in enumerate(lists): 
    .... 
    description.append(record.description) 

新换行符

默认情况下,列表不会被打印这种方式,用pprint

import pprint 

# you original code here 

pprint.pprint(description) 
相关问题