2017-07-31 29 views
0

我有数据帧如下r分割分隔的字符串中的柱,并插入新的柱(二进制)

+---+-----------+ 
|lot|Combination| 
+---+-----------+ 
|A01|A,B,C,D,E,F| 
|A01|A,B,C  | 
|A02|B,C,D,E | 
|A03|A,B,D,F | 
|A04|A,C,D,E,F | 
+---+-----------+ 

每个字母的是由逗号分隔的一个字符,我想分裂“组合'在每个逗号上并以二进制形式插入拆分字符串作为新列。举例来说,所需的输出将是:

+---+-+-+-+-+-+-+ 
|lot|A|B|C|D|E|F| 
+---+-+-+-+-+-+-+ 
|A01|1|1|1|1|1|1| 
|A01|1|1|1|0|0|0| 
|A02|0|1|1|1|1|0| 
|A03|1|1|0|1|0|1| 
|A04|1|0|1|1|1|1| 
+---+-+-+-+-+-+-+ 

任何帮助将不胜感激:)

+0

同时检查我的答案是真正的工作,看看这篇文章:HTTPS: //sackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example并编辑你的问题,并交换图像的真实数据请:) – Jan

+0

'库(splitstackshape); cSplit_e(df,“组合”,类型=“字符”,填充= 0)'应该这样做...... – A5C1D2H2I1M1N2O1R2T1

回答

1

使用dplyrtidyr的解决方案。 dt2是最终输出。

# Load packages 
library(dplyr) 
library(tidyr) 

# Create example data frame 
dt <- lot <- c("A01", "A01", "A02", "A03","A04") 
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,B,D,F", "A,C,D,E,F") 
dt <- data_frame(lot, Combination) 

# Process the data 
dt2 <- dt %>% 
    mutate(ID = 1:n()) %>% 
    mutate(Combination = strsplit(Combination, split = ",")) %>% 
    unnest() %>% 
    mutate(Value = 1) %>% 
    spread(Combination, Value, fill = 0) %>% 
    select(-ID) 
+0

谢谢!有用!!结果正是我期待的:) –

1

请一个形式,是直接使用作为人应答输入提供您的样本输入数据。我在这里自己添加了相同的样本数据希望有所帮助。

library(tidyr) 
library(dplyr) 
lot <- c("A01", "A02", "A03","A04") 
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,C") 
df <- data.frame(lot, Combination) 
df 

separate(df, Combination, into=paste("V",1:6, sep=""), sep=",") %>% 
    gather(key, value,-lot) %>% 
    filter(!is.na(value)) %>% 
    mutate(yesno = 1) %>% 
    distinct %>% 
    spread(value, yesno, fill = 0) %>% select(-key) 

为了理解这里发生了什么,从单独的()开始逐个运行每一步。 %>%是一个管道运算符,它是将前一行的结果作为下一行的第一个参数添加的简写。

0

另一种选择,使用便捷的separate_rows()功能:

df <- read.table(text = "lot|Combination 
A01|A,B,C,D,E,F 
A01|A,B,C 
A02|B,C,D,E 
A03|A,B,D,F  
A04|A,C,D,E,F", sep ="|", header = TRUE) 

library(tidyverse) 
df %>% 
    mutate(id = row_number(), flg = 1) %>% 
    separate_rows(Combination, sep = ",") %>% 
    spread(Combination, flg) 

给出:

lot id A B C D E F 
1 A01 1 1 1 1 1 1 1 
2 A01 2 1 1 1 NA NA NA 
3 A02 3 NA 1 1 1 1 NA 
4 A03 4 1 1 NA 1 NA 1 
5 A04 5 1 NA 1 1 1 1