2017-06-06 138 views
4

我有一个数据框,其大部分是每行一个观察值。然而,某些行有多个值:将一个列中的多个值拆分为多个行R

# A tibble: 3 x 2 
      `number` abilities 
      <dbl>  <chr> 
1    51  b1261 
2    57  d710 
3    57 b1301; d550 

structure(list(`number` = c(51, 57, 57), abilities = c("b1261", 
"d710", "b1301; d550")), .Names = c("number", "abilities" 
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame" 
)) 

我想获得如下:

# A tibble: 3 x 2 
      `number` abilities 
      <dbl>  <chr> 
1    51  b1261 
2    57  d710 
3    57  d550 
4    57  b1301 

这是直截了当足以分裂的;但我不确定如何轻松添加新行,特别是因为功能可能包含超过2个值。

这是非常相似的:R semicolon delimited a column into rows,但并不需要删除重复

回答

5

有一个功能separate_rowstidyr来做到这一点:

library(tidyr) 
## The ";\\s+" means that the separator is a ";" followed by one or more spaces 
separate_rows(df,abilities,sep=";\\s+") 
    number abilities 
    <dbl>  <chr> 
1  51  b1261 
2  57  d710 
3  57  b1301 
4  57  d550 
+1

我认为你也需要修剪空白,或者使用'sep =“; \\ s +”',否则最后一个条目在开始时会有空格。 – Marius

+0

@Marius你是对的,我没有发现它。谢谢! – Lamia

+0

谢谢你,我不知道tidyr中的那个特性。我将空格匹配调整为:“; \\ s *”以允许零个或多个空格 – pluke

2

dplyr好这个,因为它有unnest

library(tidyverse) 
library(stringr) 
df %>% 
    mutate(unpacked = str_split(abilities, ";")) %>% 
    unnest %>% 
    mutate(abilities = str_trim(unpacked)) 
1

另一种选择是cSplit

library(splitstackshape) 
cSplit(df1, 'abilities', '; ', 'long') 
# number abilities 
#1:  51  b1261 
#2:  57  d710 
#3:  57  b1301 
#4:  57  d550