library(rvest)
#> Le chargement a nécessité le package : xml2
website <-read_html("https://afltables.com/afl/stats/teams/adelaide/2017_gbg.html")
在这个网站上,你有几个表格,每个链接在主页面上的印刷表格上方显示一个链接,上面有 。 对html_nodes("tables")
的结果使用html_tables
可让您一次获取列表中的所有表格。
all_tables <- website %>%
html_nodes("table") %>%
html_table()
str(all_tables, 1)
#> List of 23
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
#> $ :'data.frame': 34 obs. of 27 variables:
然后,您可以选择您想要的表,但头仍然不 权
head(all_tables[[1]])
#> Disposals Disposals Disposals Disposals Disposals Disposals
#> 1 Player R1 R2 R3 R4 R5
#> 2 Atkins, Rory 19 19 19 23 29
#> 3 Beech, Jonathon
#> 4 Betts, Eddie 18 13 16 22 12
#> 5 Brown, Luke 18 12 13 9 15
#> 6 Cameron, Charlie 23 17 16 16 13
#> Disposals Disposals Disposals Disposals Disposals Disposals Disposals
#> 1 R6 R7 R8 R9 R10 R11 R12
#> 2 23 20 21 28 37 14 25
#> 3 15
#> 4 16 13 9 16 14 12 11
#> 5 17 13 20 25 16 12
#> 6 13 14 10 18 13 8 13
#> Disposals Disposals Disposals Disposals Disposals Disposals Disposals
#> 1 R14 R15 R16 R17 R18 R19 R20
#> 2 28 15 23 18 19 16 16
#> 3 12 11
#> 4 14 11 13 16 8 16
#> 5 10 15 14 17 11 10 20
#> 6 15 10 20 6 9 17
#> Disposals Disposals Disposals Disposals Disposals Disposals Disposals
#> 1 R21 R22 R23 QF PF GF Tot
#> 2 27 21 21 16 22 17 536
#> 3 38
#> 4 7 16 12 13 13 7 318
#> 5 17 17 9 20 10 13 353
#> 6 13 10 10 15 19 16 334
使用列表上的一些操作和表与purrr
和dplyr
, 可以格式化你的表,有2个标题:
all_tables <- website %>%
html_nodes("table") %>%
# do not let httr handles header automatically.
html_table(header = FALSE)
library(purrr)
#>
#> Attachement du package : 'purrr'
#> The following object is masked from 'package:rvest':
#>
#> pluck
all_tables <- all_tables %>%
# get the first column, first row to set the name for the list elements
# pluck is a purrr function acting like x[[1]][1, 1] here
lmap(~ set_names(.x, nm = pluck(.x, 1, 1, 1))) %>%
# For each table, set second line as header
# and delete first and second line
map(~ set_names(.x, nm = .x[2, ]) %>% slice(-c(1, 2)))
str(all_tables_res, 1)
#> List of 23
#> $ Disposals :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Kicks :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Marks :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Handballs :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Goals :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Behinds :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Hit Outs :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Tackles :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Rebounds :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Inside 50s :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Clearances :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Clangers :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Frees :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Frees Against :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Brownlow Votes :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Contested Possessions :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Uncontested Possessions:Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Contested Marks :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Marks Inside 50 :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ One Percenters :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Bounces :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ Goal Assists :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
#> $ % Played :Classes 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 27 variables:
You can now called any table of the website.
head(all_tables_res$Goals)
#> # A tibble: 6 x 27
#> Player R1 R2 R3 R4 R5 R6 R7 R8 R9
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Atkins, Rory 3 1 - 2 1 - 1 1 -
#> 2 Beech, Jonathon
#> 3 Betts, Eddie 4 3 3 6 3 1 3 2 3
#> 4 Brown, Luke - 1 - - 1 - - - -
#> 5 Cameron, Charlie 2 1 - 1 2 2 2 - 4
#> 6 Crouch, Brad - - - - 1
#> # ... with 17 more variables: R10 <chr>, R11 <chr>, R12 <chr>, R14 <chr>,
#> # R15 <chr>, R16 <chr>, R17 <chr>, R18 <chr>, R19 <chr>, R20 <chr>,
#> # R21 <chr>, R22 <chr>, R23 <chr>, QF <chr>, PF <chr>, GF <chr>,
#> # Tot <chr>
第一位是错误的信息和shld被删除或澄清。即使有一个不是函数的table变量(例如'table < - c(2,1,2,5,2,3,...),R足够聪明以至于仍然可以调用table() 1);表(表)')。它不会被“覆盖”。一般来说,这仍然是一种不好的做法,并不是一个好主意,但不是因为你说的原因。 – hrbrmstr
感谢您致电@hrbrmstr。出于兴趣,你是否碰巧有一个消息来源解释了R如何以及为什么有足够的智能来区分?如果R真的很聪明,那么似乎避免像'list','data'和'c'这样的名字的主要(唯一的原因)是让程序员不会感到困惑,因为R似乎正在处理它正好。 –