2016-09-24 74 views
0

我有几个XML文件,并显示类似以下内容的结构:XML处理:使用xmlGetAttr在子节点

<?xml version='1.0' encoding='UTF-8'?> 

<text> 

    <stage></stage> 

    <div> 
     <intro agent= "Peter"></intro> 
     <dialogue agent= "Peter"></dialogue> 
     <outro agent= "Stephen"></outro> 
    </div> 

    <div> 
    <intro agent= "Sandra"></intro> 
     <dialogue agent= "Peter"></dialogue> 
    <outro agent= "Robert"></outro> 
    </div> 

    <stage></stage> 

</text> 

我的目标是让所有的“代理人”的列表。我想出了

agents <- xmlApply(xml_processed[["test.xml"]], xmlGetAttr, "agent", default= "-") 

但这只会给我相应的值,如果他们在“div”节点。 xml_processed是

# preprocess XML 

preprocess_xml <- function() { 
xmlfiles <- list.files("data/XML", pattern = "*.xml") 
path <- "data/XML" 
xmlfiles_path <- file.path(path, xmlfiles) 

xmlcontent <- list() 

for(i in 1:length(xmlfiles)) { 
    xmlcontent[[xmlfiles[i]]] <- xmlTreeParse(xmlfiles_path[i]) 
} 

xmlfinal <- list() 

for(i in 1:length(xmlcontent)) { 
    xmlfinal[[xmlfiles[i]]] <- xmlRoot(xmlcontent[[i]]) 
} 
return(xmlfinal) 
} 

我也试过

agents <- xmlApply(xml_processed[["test.xml"]], "/text/div/intro", xmlGetAttr, "agent", default= "-") 

得到介绍节点的代理。但是,这只会给我一个错误:

get(as.character(FUN), mode = "function", envir = envir) 

回答

3

依我看是时候更专注于XPath的比R:

txt <- '<?xml version="1.0" encoding="UTF-8"?> 
<text> 
    <stage></stage> 
    <div> 
     <intro agent= "Peter"></intro> 
     <dialogue agent= "Peter"></dialogue> 
     <outro agent= "Stephen"></outro> 
    </div> 
    <div> 
    <intro agent= "Sandra"></intro> 
     <dialogue agent= "Peter"></dialogue> 
    <outro agent= "Robert"></outro> 
    </div> 
    <stage></stage> 
</text>' 

library(xml2) 
library(magrittr) 

doc <- read_xml(txt) 
xml_find_all(doc, ".//*[@agent]") %>% 
    xml_attr("agent") 

如果你必须使用XML包:

library(XML) 

doc <- xmlParse(txt) 
xpathSApply(doc, "//*[@agent]", xmlGetAttr, "agent")