我使用BioPython来填充CSV文件中关于引用来自PubMed标题的数据。到目前为止,我写了这个:使用BioPython搜索PubMed并写入CSV
import csv
from Bio import Entrez
import bs4
Entrez.email = "my_email"
CSVfile = open('srData.csv')
fileReader = csv.reader(CSVfile)
Data = list(fileReader)
with open('blank.csv','w') as f1:
writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
for id in Data:
handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
record = Entrez.read(handle)
title=record[0]['MedlineCitation']['Article']['ArticleTitle']
abstract=record[0]['MedlineCitation']['Article']['Abstract']
mesh =record[0]['MedlineCitation']['MeshHeadingList']
descriptors = ','.join(term['DescriptorName'] for term in mesh)
writer.writerow([title, abstract, descriptors])
然而,这样会产生一个不寻常的输出,其中的标题,摘要和主题词分布在多个列传播,而不是分开的,我推测是由于它们的类型。 ()。我希望我的csv表由三列组成,一列包含标题,另一列包含摘要,另一列包含网格术语。
我该如何做到这一点?
样本输出
为了澄清,第一列包含整个标题和摘要和未来数列的开头包含抽象的后续部分。我要求他们分裂成不同的专栏。即。第一列应该只包含标题。第二只是抽象的,第三只是MeSH条款。
目前,第一列包含:
"Distinct and combined vascular effects of ACE blockade and HMG-CoA reductase inhibition in hypertensive subjects. {u'AbstractText': ['Hypercholesterolemia and hypertension are frequently associated with elevated sympathetic activity. Both are independent cardiovascular risk factors and both affect endothelium-mediated vasodilation. To identify the effects of cholesterol-lowering and antihypertensive treatments on vascular reactivity and vasodilative capacity"
你是什么意思“称号的意思是,抽象和MeSH术语分布在多个列“?你能向我们展示一些样本输出吗? – larsks
@Iarsks已经这样做了。 – Toby