0
我试图用rvest从维基百科(包括从其他网页链接)拉ISO国家简介。我找不到包含名称的正确获取链接(href属性)的方法(我试过xpath字符串函数会导致错误)。运行起来相当容易 - 而且自我解释。的R - 网页刮痧 - 麻烦获取属性值使用rvest
任何帮助表示赞赏!
library(rvest)
library(dplyr)
searchPage <- read_html("https://en.wikipedia.org/wiki/ISO_3166-2")
nodes <- html_node(searchPage, xpath = '(//h2[(span/@id = "Current_codes")]/following-sibling::table)[1]')
codes <- html_nodes(nodes, xpath = 'tr/td[1]/a/text()')
names <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/text()')
#Following brings back data but attribute name as well
links <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/@href')
#Following returns nothing
links2 <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/@href/text()')
#Following Errors
links3 <- html_nodes(nodes, xpath = 'string(tr/td[2]//a[@title]/@href)')
#Following Errors
links4 <- sapply(nodes, function(x) { x %>% read_html() %>% html_nodes("tr/td[2]//a[@title]") %>% html_attr("href") })
谢谢!对不起,我认为评论会足够好,将来会尝试着提供更多信息! –