2012-08-03 67 views
2

我正在尝试自动完成此readHTML函数的数据框的保存和保存;我是一个R新手,并且很难弄清楚如何编写一个循环来自动执行这个功能,如果你一个一个地执行这个功能,如何在R中读取多个HTML表格

library('XML') 

urls<-c("http://www.basketball-reference.com/teams/ATL/","http://www.basketball-reference.com/teams/BOS/") 
theurl<-urls[2] #Pick second link (celtics) 

tables <- readHTMLTable(theurl) 
n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) 
BOS <-tables[[which.max(n.rows)]] 
Team.History<-write.csv(BOS,"Bos.csv") 

任何和所有的帮助将非常感激!

+0

你似乎已经想出了如何使用'lapply'。你有没有考虑在你的'urls'矢量上使用这些'lapply'技能? – joran 2012-08-03 22:08:12

+0

请注意,不需要将'write.csv'的结果赋值给变量。 – seancarmody 2012-08-03 23:25:41

回答

1

我假设你想循环你的urls向量?我会尝试这样的:

library('XML') 

url_base <- "http://www.basketball-reference.com/teams/" 
teams <- c("ATL", "BOS") 

# better still, get the full list of teams as in 
# http://stackoverflow.com/a/11804014/1543437 

results <- data.frame() 
for(team in teams){ 
    theurl <- paste(url_base, team , sep="/") 
    tables <- readHTMLTable(theurl) 
    n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) 
    team.results <-tables[[which.max(n.rows)]] 
    write.csv(team.results, file=paste0(team, ".csv")) 
    team.results$TeamCode <- team 
    results <- rbind(results, team.results) 
} 
write.csv(results, file="AllTeams.csv") 
+0

您可能还想将所有结果捆绑到一个文件中(请参阅上面的答案中的更新)。 – seancarmody 2012-08-04 00:57:05

2

我认为这结合了最好的两个答案(和整理一点)。

library(RCurl) 
library(XML) 

stem <- "http://www.basketball-reference.com/teams/" 
teams <- htmlParse(getURL(stem), asText=T) 
teams <- xpathSApply(teams,"//*/a[contains(@href,'/teams/')]", xmlAttrs)[-1] 
teams <- gsub("/teams/(.*)/", "\\1", teams) 
urls <- paste0(stem, teams) 

names(teams) <- NULL # get rid of the "href" labels 
names(urls) <- teams 

results <- data.frame() 
for(team in teams){ 
    tables <- readHTMLTable(urls[team]) 
    n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) 
    team.results <- tables[[which.max(n.rows)]] 
    write.csv(team.results, file=paste0(team, ".csv")) 
    team.results$TeamCode <- team 
    results <- rbind(results, team.results) 
    rm(team.results, n.rows, tables) 
} 
rm(stem, team) 

write.csv(results, file="AllTeams.csv") 
+0

肖恩这是高手。谢谢!我希望你自己用它来为你的运动刮刮乐。保持联系与其他体育相关的刮你喜欢的东西。打我在twitter @abresler – 2012-08-04 16:41:39

+0

我一直在想奥运会的一些数字嘎吱嘎吱......但还没有找到时间。或许太多时间了? – seancarmody 2012-08-06 08:38:43