我正在尝试改变以前的脚本,该脚本利用biopython获取关于物种门的信息。这个脚本是为了一次检索一个物种的信息而编写的。我想修改脚本,以便我一次可以处理100个生物体。 这里是最初的代码尝试从Biopython获取分类信息
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(" ", "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
taxid = get_tax_id("Erodium carvifolium")
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}
我已成功地修改脚本,以便它接受一个包含我现在用的是生物的一个本地文件。但是我需要将它延伸到100个生物体。 因此,这个想法是从我的有机体文件中生成一个列表,并以某种方式将列表中生成的每个项目分别送入taxid = get_tax_id("Erodium carvifolium")
行,并用我的有机体名称替换“Erodium carvifolium”。但我不知道该怎么做。
这里是代码的样本版本与我的一些调整
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(' ', "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
i = iter(list)
item = i.next()
for item in list:
???
taxid = get_tax_id(?)
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['phylum']}
print lineage, taxid
问号是指在那里我难倒下一步做什么的地方。我不明白我如何连接我的循环来替换?在get_tax_id(?)中。或者我需要以某种方式附加列表中的每个项目,以便每次修改它们以包含get_tax_id(Helicobacter pylori 26695)
,然后找到某种方法将它们放置在包含taxid的行中=
你应该问biostars:http://www.biostars.org/ – Pierre 2013-05-12 17:51:17
谢谢你的忠告 – user2374216 2013-05-12 23:09:46