我想解析来自NCBI xml文件的某些子节点的xmlValue
。但是,对于某些PM.ID,Root node <PubmedArticleSet>
具有不同的信息w.r.t公开的记录,PubmedBookArticle
和PubmedArticle
。我想通过一个条件,if(xmlName(fetch.pubmed) == PubmedBookArticle
提取某些值elseif (xmlName(fetch.pubmed) == PubmedArticle
提取其他值。最后,制作一个dataframe
,这两个值都对应于它们的PMID。这看似简单,但(xmlName(fetch.pubmed)
抛出错误no applicable method for 'xmlName' applied to an object of class "c('XMLInternalDocument', 'XMLAbstractDocument')"
任何帮助表示赞赏,谢谢如何访问XML文件中具有不同名称的子节点(子)的值?
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2015//EN" "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_150101.dtd">
<PubmedArticleSet>
<PubmedBookArticle>
<BookDocument>
<PMID Version="1">25506969</PMID>
<ArticleIdList>
<ArticleId IdType="bookaccession">NBK259188</ArticleId>
</ArticleIdList> ....
...... </BookDocument>
</PubmedBookArticle>
<PubmedArticle>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">25013473</PMID>
<DateCreated>
<Year>2014</Year>
<Month>7</Month>
<Day>11</Day>
</DateCreated>....
....</MedlineCitation>
</PubmedArticle>
</PubmedArticleSet>
我的代码如下
library(XML)
library(rentrez)
PM.ID <- c("25506969"," 25032371"," 24983039","24983034","24983032","24983031",
"26386083","26273372","26066373","25837167",
"25466451","25013473")
# rentrez function to retrieve XMl file for above PIMD
fetch.pubmed <- entrez_fetch(db = "pubmed", id = PM.ID,
rettype = "xml", parsed = T)
# If empty records, return NA
FindNull <- function(x,x1child){
res <- xpathSApply(x,x1child,xmlValue)
if (length(res) == 0){
out <- NA
}else {
out <- res
}
out
}
# extract contents from xml file
xpathSApply(fetch.pubmed,"//PubmedArticle",FindNull,x1child = './/ArticleTitle')
xpathSApply(fetch.pubmed,"//PubmedBookArticle",FindNull,x1child = './/BookTitle')
如何让上面的代码在一个循环,这样我可以检索值在PubmedArticle和PubmedBookArticle中作为条件满足每个搜索?
谢谢克里斯。这绝对有帮助。我想,分开提取书籍和文章更符合你的建议。我尝试了一个for循环,它只会减慢并且使进程复杂化。 – user5249203
有时,您可以使用像xpathSApply(fetch.pubmed,c(“// BookTitle”,“// ArticleTitle”),xmlValue)这样的矢量来搜索两个不同的名称,但第一个结果有一个BookTitle和一个ArticleTitle,所以它更容易与节点一起工作。 –
或者'xpathSApply(fetch.pubmed,c(“// BookTitle”,“// Article/ArticleTitle”),xmlValue)' –