2017-03-01 108 views
0

我试图从篮球参考网站提取数据。使用rvest和html_nodes()和html_table提取网站表格

library(rvest) 
data7 <- read_html("http://www.basketball-reference.com/teams/CLE/2017.html") %>% 
html_nodes("[id=roster]") %>% 
html_table() 
data7 

上面的代码返回“roster”表中的数据。但是,下面的代码不返回“team_misc”表,而是返回一个列表与legth零:

html_nodes("[id=team_misc]") %>% 

我是相当新的rvest因此,如果任何人有任何想法,为什么这不起作用,将大大不胜感激。

+0

你在SO R课题从这个确切的同一站点都刮数据_plethora_闲逛? – hrbrmstr

+0

hrbrmstr - 我搜索了rvest,html_nodes,html_table等,但没有认识到篮球参考网站上的帖子数量。下面的帖子可能会回答我的问题:http://stackoverflow.com/questions/41434984/readhtmltable-in-r-only-bringing-back-first-two-tables-from-basketball-reference –

回答

0

实际上已经有了一个答案,但它适用于旧版本的网站....你无法获得其他表的原因是因为它们是动态创建的,并且在呈现R表格中的原始页面时你想在注释掉字符串。你应该检查铬的页面元素,看看我指的是什么。其他答案就在这里How to scrape tables inside a comment tag in html with R?

但你一年的数据:

A <- read_html('http://www.basketball-reference.com/teams/CLE/2017.html') %>% # Read in the raw webpage 
    xml_find_all('//comment()') %>% # Use xpath to find all comment nodes 
    xml_text() %>% # convert to raw strings 
    paste0(collapse = "") %>% # flatten into a character vector 
    read_html %>% # re-read as html content 
     xml_find_all("//table") %>% html_table 

cat(capture.output(lapply(A, head, 1)), sep = "\n") 


[[1]] 
        Date Type                      Note 
1 Kevin Love 2017-02-12 Knee Love is expected to miss six weeks after undergoing arthroscopic surgery on his left knee. 

[[2]] 
      X1    X2 
1 Jim Boylan  Assistant Coach 

[[3]] 
     G MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS 
1 Team 58 14020 2305 4938 0.467 761 1952 0.39 1544 2986 0.517 1073 1420 0.756 564 1988 2552 1304 414 237 804 1033 6444 

[[4]] 
    NA NA NA NA NA NA NA NA NA NA Advanced NA Offense Four Factors NA NA  NA Defense Four Factors NA NA  NA    NA 
1 W L PW PL MOV SOS SRS ORtg DRtg Pace  FTr 3PAr     eFG% TOV% ORB% FT/FGA     eFG% TOV% DRB% FT/FGA Arena Attendance 

[[5]] 
    Rk    Age G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS/G 
1 1 LeBron James 32 54 54 37.5 9.6 17.7 0.541 1.7 4.4 0.387 7.9 13.3 0.592 0.589 4.8 6.9 0.691 1.1 6.7 7.9 8.9 1.4 0.6 4.3 1.7 25.7 

[[6]] 
    Rk    Age G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS 
1 1 LeBron James 32 54 54 2026 518 957 0.541 92 238 0.387 426 719 0.592 0.589 259 375 0.691 62 363 425 479 74 32 230 92 1387 

[[7]] 
    Rk    Age G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS 
1 1 LeBron James 32 54 54 2026 9.2 17 0.541 1.6 4.2 0.387 7.6 12.8 0.592 4.6 6.7 0.691 1.1 6.5 7.6 8.5 1.3 0.6 4.1 1.6 24.6 

[[8]] 
    Rk    Age G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS ORtg DRtg 
1 1 LeBron James 32 54 54 2026 12.7 23.4 0.541 2.3 5.8 0.387 10.4 17.6 0.592 6.3 9.2 0.691 1.5 8.9 10.4 11.7 1.8 0.8 5.6 2.3 34 NA 118 107 

[[9]] 
    Rk    Age G MP PER TS% 3PAr FTr ORB% DRB% TRB% AST% STL% BLK% TOV% USG% Â OWS DWS WS WS/48 Â OBPM DBPM BPM VORP 
1 1 LeBron James 32 54 2026 26.3 0.618 0.249 0.392 3.5 19.1 11.6 41.7 1.8 1.3 17 29.4 NA 6.9 2.4 9.3 0.22 NA 6.3 1.8 8 5.1 

[[10]] 
    NA NA NA NA NA NA     NA NA NA NA NA NA    NA NA NA NA NA NA 2-Pt Field Goals NA NA 3-Pt Field Goals  NA 
1 <NA> <NA> <NA> <NA> <NA> <NA> % of FGA by Distance <NA> <NA> <NA> NA <NA> FG% by Distance <NA> <NA> <NA> NA <NA>     Dunks <NA>     Corner 
    NA  NA NA 
1 <NA> Heaves <NA> 

[[11]] 
    Rk     Salary 
1 1 LeBron James $30,963,450 

[[12]] 
          Yr Tm Rd Pk    Team  G MP FG FGA FG% 3P 3PA 3P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS 
1 Vladimir Veremeenko NA 2006 WAS 2 48 NA Reggio Emilia it 18 139 17 29 0.586 0 0 NA 4 9 0.444 14 10 24 8 2 3 9 33 38