1
虽然网上刮我碰到下面的问题,对此我认为有可能是一个更好的解决方案:rvest | Webscraping数据为长格式
有这样的数据:
dat <- data.frame(query = c("Washington, USA", "Frankfurt, Germany"))
query
1 Washington, USA
2 Frankfurt, Germany
我想查询例如Google Maps Api并返回格式化的地址(es)。可能有多种格式。结果应该是以下几点:
query formatted_address
1 Washington, USA Washington, DC, USA
2 Washington, USA Washington, UT, USA
3 Washington, USA Washington, VA 22747, USA
4 Washington, USA Washington, IA 52353, USA
5 Washington, USA Washington, GA 30673, USA
6 Washington, USA Washington, PA 15301, USA
7 Frankfurt, Germany Frankfurt, Germany
我现在做的是这样的:
require(RCurl)
require(rvest)
require(magrittr)
build_url <- function(x, base_url = "https://maps.googleapis.com/maps/api/geocode/xml?address="){
paste0(base_url, RCurl::curlEscape(x))
}
l <- lapply(dat$query, function(q){
formatted_address <- q %>% build_url %>% read_xml %>% xml_nodes("formatted_address") %>% xml_text
data.frame(query = q, formatted_address)
})
do.call(rbind, l) # This can be done via data.table::rbindlist as well
有没有更好的解决办法?也许更多data.table
或dplyr
风格?
请包括'library' /'require'呼吁让你的代码可重复 – jangorecki
肯定。刚刚在data.frame创建时添加了'require'语句 – Rentrop
,除了'stringsAsFactors = FALSE'之外,您已经优化了这个完美的IMO。我建议在lappl中添加一个'sleep',并确保将呼叫数量限制为2500或更少的IIRC([使用限制](https://developers.google.com/maps/documentation/business/articles/usage_limits)info)。 – hrbrmstr