2017-07-25 91 views
-1

我有这样的数据帧使用其他列

dat = data.frame(Type = c("A","A","B","B","C","C","D"), NextType = c("A", "B","B", "C","C","D",NA), 
       A = c(rep(0,7)), 
       B = rep(0,7), 
       C = rep(0,7) , 
       D = rep(0,7), 
       stringsAsFactors = F) 
dat 

Type NextType A B C D 
1 A  A 0 0 0 0 
2 A  B 0 0 0 0 
3 B  B 0 0 0 0 
4 B  C 0 0 0 0 
5 C  C 0 0 0 0 
6 C  D 0 0 0 0 
7 D  <NA> 0 0 0 0 

什么来填充列A,B和C和d为1,如果该列名(A,B,C的最佳方式应用公式某些列,d等)=类型= NextType

所以

column A would be 1,0,0,0,0,0,0 
column B would be 0,0,1,0,0,0,0 
column C would be 0,0,0,0,1,0,0 
column D would be 0,0,0,0,0,0,0 

注意 - 这是动态的。我有4列以上A,B和C和D,但可以有10,20或任何数量的列。

回答

1

使用dplyrtidyr

library(dplyr); library(tidyr); 

dat %>% 
    select(Type, NextType) %>% 
    mutate(key = if_else(Type == NextType & !is.na(Type) & !is.na(NextType), Type, "other"), 
      val = 1) %>% 
    spread(key, val, fill = 0) %>% 
    select(-other) 

# Type NextType A B C 
#1 A  A 1 0 0 
#2 A  B 0 0 0 
#3 B  B 0 1 0 
#4 B  C 0 0 0 
#5 C  C 0 0 1 
#6 C  <NA> 0 0 0 

数据

dat = data.frame(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA), A = c(rep(0,6)), B = rep(0,6), C = rep(0,6) , stringsAsFactors = F) 
+0

对不起,我根本没有匹配的情况下排除了案件。你能看到编辑? – user3022875

+0

所以你想要一个专门为零的列? – Psidom

+0

在这种情况下,你可以尝试使用'model.matrix',创建一个与'Type'列+其他级别相同级别的因子类型的关键列,'model.matrix'将保留缺少的额外级别。 (type,NextType)%>%mutate(key = factor(if_else(Type == NextType&!is.na(Type)&!is.na(NextType),Type,“other”), levels = c(“other”,unique(Type))))%>%bind_cols(。,as.data.frame(model.matrix(〜key - 1,。)))%>%select(-keyother, - 键)' – Psidom

1

我会做这样的:

library(tidyr) 
library(dplyr) 
dat = data.frame(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA)) 
dat <- dat %>% mutate(A=ifelse(Type == NextType & Type == 'A', 1, 0),B=ifelse(Type == NextType & Type == 'B', 1, 0),C=ifelse(Type == NextType & Type == 'C', 1, 0)) 
+0

感谢,但它需要动态处理任意数量的列 – user3022875

0

data.table

library(data.table) 
dat = data.table(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA), 
      A = c(rep(0,6)), B = rep(0,6), C = rep(0,6)) 
dat 

dat[Type=="A", A:=(Type == NextType)] 
dat[Type=="B", B:=(Type == NextType)] 
dat[Type=="C", C:=(Type == NextType)] 

编辑

动态(可能不是非常有效的,也许有人有其他的建议?)

mycols <- names(dat)[!(names(dat) %in% c("Type", "NextType"))] 
for(i in mycols){ 
    dat[Type==i, (i) := (Type==NextType)] 
} 
+0

你可以把它的动态任何数量的列见编辑 – user3022875

+0

见编辑。它做你想要的。不知道如何有效 – simone

1

以下是使用model.matrix,diffapply的方法。

cbind(dat[1], apply(model.matrix(~Type-1, dat), 2, function(x) c(x[1], diff(x) > 0))) 

model.matrix(~Type-1, dat)返回虚拟变量,其中每个列是1时对应的值是存在于所述列的矩阵。这被送到apply,它取每一列并返回列的第一个值以及评估差值是否大于0.结果矩阵与第一列使用cbind合并。

如果您希望包括第二列,以及,改变df[1]df[1:2]返回

Type TypeA TypeB TypeC 
1 A  1  0  0 
2 A  0  0  0 
3 B  0  1  0 
4 B  0  0  0 
5 C  0  0  1 
6 C  0  0  0 


使用lapply一种替代基R法是

dat[, LETTERS[1:3]] <- lapply(unique(dat$Type), 
           function(x) (dat$Type == x) * !duplicated(dat$Type)) 

在这里,我们将循环的DAT $类型的唯一值,并检查DAT $类型的每个元素是否等于该值,并且如果该元素是重复的。这将返回一个分配给dat中相应变量的列表。