Python Entrez词典中的字典返回值

我想从Entrez Gene页面中删除Interactions表。Python Entrez词典中的字典返回值

Interactions表从Web服务器填充，当我试图在R中使用XML包时，我可以获得Entrez基因页面，但Interactions表体是空的（它尚未由Web服务器填充）。

处理R中的Web服务器问题可能是可以解决的（我很想看看它是如何实现的），但似乎Biopython是一条更简单的路径。

我下面的放在一起，这给了我什么，我想为例基因：

# Pull the Entrez gene page for MAP1B using Biopython 

from Bio import Entrez 
Entrez.email = "[email protected]" 
handle = Entrez.efetch(db="gene", id="4131", retmode="xml") 
record = Entrez.read(handle) 
handle.close() 

PPI_Entrez = [] 
PPI_Sym = [] 

# Find the Dictionary that contains the Interaction table 
    for x in range(1, len(record[0]["Entrezgene_comments"])): 
    if ('Gene-commentary_heading', 'Interactions') in record[0]["Entrezgene_comments"][x].items(): 
     for y in range(0, len(record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'])): 
      EntrezID = record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_src']['Dbtag']['Dbtag_tag']['Object-id']['Object-id_id'] 
      PPI_Entrez.append(EntrezID) 
      Sym = record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_anchor'] 
      PPI_Sym.append(Sym) 

# Return the desired values: I want the Entrez ID and Gene symbol for each interacting protein 
PPI_Entrez # Returns the EntrezID 
PPI_Sym # Returns the gene symbol

此代码的工作，给我我想要的东西。但我认为它很丑，而且担心如果Entrez基因页面在格式上略有变化，它会破坏代码。尤其是，必须有一个更好的方法来提取所需的信息不是指定的完整路径，因为我做：

record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_anchor']

但我无法弄清楚如何通过词典的词典搜索，而无需指定每个级别我想要下降。当我尝试像find（）这样的函数时，它们会在下一级进行操作，但不会一直到底部。

是否有一个通配符符号，“//”的Python等价物，或者我可以用来在不指定完整路径的情况下转到['Object-id_id']的函数？其他建议更干净的代码也表示赞赏。

来源

2014-12-05 jamayfie

我不确定Python中的xpath，但如果代码正常工作，那么我不会担心删除完整路径，或者Entrez Gene XML会发生变化。由于您第一次尝试使用R，您可以通过系统调用Entrez Direct或使用像rentrez这样的软件包来获取XML。

doc <- xmlParse(system("efetch -db=gene -id=4131 -format xml", intern=TRUE))

接下来，获得对应表中的行中的节点在http://www.ncbi.nlm.nih.gov/gene/4131#interactions

x <- getNodeSet(doc, "//Gene-commentary_heading[.='Interactions']/../Gene-commentary_comment/Gene-commentary") 

length(x) 
[1] 64 
x[1] 
x[50]

尝试简单的东西第一

xmlToDataFrame(x[1:4]) 

    Gene-commentary_type Gene-commentary_text Gene-commentary_refs Gene-commentary_source       Gene-commentary_comment 
1     18 Affinity Capture-MS    24457600 BioGRID110304BioGRID 255BioGRID110304255GeneID8726EEDBioGRID114265 
2     18 Reconstituted Complex    20195357 BioGRID110304BioGRID 255BioGRID110304255GeneID2353FOSBioGRID108636 
3     18 Reconstituted Complex    20195357 BioGRID110304BioGRID 255BioGRID110304255GeneID1936EEF1DBioGRID108256 
4     18 Affinity Capture-MS  2345592220562859 BioGRID110304BioGRID 255BioGRID110304255GeneID6789STK4BioGRID112665 
    Gene-commentary_create-date Gene-commentary_update-date 
1     2014461120    201410513330 
2    201312810490    201410513330 
3    201312810490    201410513330 
4     20137710360    201410513330

一些标签，如文本，裁判，来源和日期应该很容易解析

sapply(x, function(x) paste(xpathSApply(x, ".//PubMedId", xmlValue), collapse=", "))

我不确定评论或表中列出的产品，交互者和其他基因是否存储在XML中，但我在这里为每个节点获取一个或三个符号和三个ID。

sapply(x, function(x) paste(xpathSApply(x, ".//Gene-commentary_comment//Other-source_anchor", xmlValue), collapse=" + ")) 
sapply(x, function(x) paste(xpathSApply(x, ".//Gene-commentary_comment//Object-id_id", xmlValue), collapse=" + "))

最后，因为我觉得Entrez基因只是复制完整和BioGrid，你可以尝试这些网站太。 Biogrid有一个非常简单的Rest服务，但你必须注册一个密钥。

url <- "http://webservice.thebiogrid.org/interactions?geneList=MAP1B&taxId=9606&includeHeader=TRUE&accesskey=[ your ACCESSKEY ]" 

biogrid <- read.delim(url) 
dim(biogrid) 
[1] 58 24 

head(biogrid[, c(8:9,12)]) 
    Official.Symbol.Interactor.A Official.Symbol.Interactor.B  Experimental.System 
1      ANP32A      MAP1B    Two-hybrid 
2      MAP1B      ANP32A    Two-hybrid 
3      RASSF1      MAP1B Affinity Capture-Western 
4      RASSF1      MAP1B    Two-hybrid 
5      ANP32A      MAP1B Affinity Capture-Western 
6       GAN      MAP1B Affinity Capture-Western

来源

2014-12-05 21:38:06

谢谢克里斯S！我曾经计划用Python编写脚本，然后从R做一个系统调用。但是这很容易（至少对我来说）。我并不知道Entrez Direct，但它很好地解决了Web服务器问题，并让我回到由Node解析而不是试图钻取Python Dictionaries。 – jamayfie 2014-12-08 18:55:56

在开始这个项目之前，我曾考虑过BioGrid，HERD等，但我喜欢NCBI将IntAct和BioGrid整理成一张桌子，如果可能，我想从那里开始。 – jamayfie 2014-12-08 19:01:41

我能得到公正的EntrezID通过指定相互作用蛋白：sapply（X，函数（x）的糊状（xpathSApply（X，” .//Gene-commentary_comment//..//Dbtag_db[.='GeneID']// ..//Object-id_id”，xmlValue），倒塌= “”）），如果你需要保存历史（esearch – jamayfie 2014-12-08 19:59:08

Python Entrez词典中的字典返回值

回答

相关问题