R：在URL中搜刮多个表格

我正在学习如何在R中使用httr和XML从网站上刮取信息。我只是为了只有几张表的网站就可以工作，但无法计算出它用于有多个表格的网站。利用亲足球参考以下页面为例：https://www.pro-football-reference.com/boxscores/201609110atl.htm R：在URL中搜刮多个表格

# To get just the boxscore by quarter, which is the first table: 
URL = "https://www.pro-football-reference.com/boxscores/201609080den.htm" 
URL = GET(URL) 
SnapTable = readHTMLTable(rawToChar(URL$content), stringAsFactors=F)[[1]] 

# Return the number of tables: 
AllTables = readHTMLTable(rawToChar(URL$content), stringAsFactors=F) 
length(AllTables) 
[1] 2

所以我能刮信息，但由于某些原因，我只能捕捉前两名表了20+的页。为了练习，我试图获得“Starters”表和“官员”表。

我无法将其他表格设置为网站设置或不正确的代码吗？

来源

2017-09-04 CoolGuyHasChillDay

如果涉及到R的网页抓取，请密切使用软件包rvest。

虽然设法得到html很好 - rvest使用了css选择器 - SelectorGadget帮助找到特定表格的样式，希望它是唯一的。因此，您可以精确提取您正在查找的表格而不是巧合

为了让您开始 - 阅读有关rvest的小插曲以获取更多详细信息。

#install.packages("rvest") 
library(rvest) 
library(magrittr) 

# Store web url 
fb_url = "https://www.pro-football-reference.com/boxscores/201609080den.htm" 

linescore = fb_url %>% 
    read_html() %>% 
    html_node(xpath = '//*[@id="content"]/div[3]/table') %>% 
    html_table()

希望这有助于。

来源

2017-09-04 09:19:33 Christian

R：在URL中搜刮多个表格

回答

相关问题