2017-05-05 42 views
2

假设我们有在读该数据帧:串联当前行,并在新列下一行

df <- data.frame(id = c(rep(1,5), rep(2, 3), rep(3, 4), rep(4, 2)), brand = c("A", "B", "A", "D", "Closed", "B", "C", "D", "D", "A", "B", "Closed", "C", "Closed")) 

> df 
# id brand 
#1 1  A 
#2 1  B 
#3 1  A 
#4 1  D 
#5 1 Closed 
#6 2  B 
#7 2  C 
#8 2  D 
#9 3  D 
#10 3  A 
#11 3  B 
#12 3 Closed 
#13 4  C 
#14 4 Closed 

我希望创建一个代表品牌栏目从当前行以下变化的新变量行,但这只能在每个ID号内发生。

创建新列:

df$brand_chg <- "" 

该环形正确完成我想做的事:

for (i in 1:nrow(df)) { 

    j <- i + 1 

    if(j > nrow(df)) next #to prevent error in very last row 

    if (df[i,'id'] != df[j, 'id']) next #to skip loop when id changes 

    df[i,'brand_chg'] <- paste(df[i,'brand'], df[j,'brand'], sep = "->") 
    #populating concatenation 
} 

#Results: 
# id brand brand_chg 
#1 1  A  A->B 
#2 1  B  B->A 
#3 1  A  A->D 
#4 1  D D->Closed 
#5 1 Closed   
#6 2  B  B->C 
#7 2  C  C->D 
#8 2  D   
#9 3  D  D->A 
#10 3  A  A->B 
#11 3  B B->Closed 
#12 3 Closed   
#13 4  C C->Closed 
#14 4 Closed 

然而,与287K行这个循环需要至少10分钟的表跑步。有谁知道更快的方法来完成这个连接?

谢谢你,我感谢你的见解。

+2

未经测试的287K行(df,ave(brand,id,FUN = function(x)c(paste(head(x,-1),tail(x,-1),sep =' - >'),''))) ' – rawr

+0

我用'with()'得到错误,但是当rem时因为'ave()'函数给了我一个正确连接的列表。谢谢!我将不得不研究它的工作原理。 – gatch

回答

5

使用dplyr包:

library(dplyr) 

df %>% group_by(id) %>% 
    mutate(brand_chg = ifelse(seq_along(brand) == n(), 
           "", 
           paste(brand, lead(brand), sep = "->"))) 
+0

谢谢!这工作,它保持在数据框架的形式。你介意用'seq_along(brand)== n()'解释发生了什么? – gatch

+0

'seq_along(brand)== n()'对于组的最后一行返回true,'seq_along'就像行索引,而'n()'是每个组的行数。 – Lamia

+0

哇,很酷。非常感谢。 – gatch

1

而且dplyr,只是有一点不同,没有更好的!用途is.na而不是n == N()

library(dplyr) 
df %>% 
    group_by(id) %>% 
    mutate(change = if_else(is.na(lead(brand)), "", paste0(brand,"->", lead(brand)))) 
1

下面是使用data.table

library(data.table) 
setDT(df)[, brand_chg := paste(brand, shift(brand, type = "lead"), sep="->"), id] 
df[df[, .I[.N] , id]$V1, brand_chg := ""] 
df 
# id brand brand_chg 
# 1: 1  A  A->B 
# 2: 1  B  B->A 
# 3: 1  A  A->D 
# 4: 1  D D->Closed 
# 5: 1 Closed   
# 6: 2  B  B->C 
# 7: 2  C  C->D 
# 8: 2  D   
# 9: 3  D  D->A 
#10: 3  A  A->B 
#11: 3  B B->Closed 
#12: 3 Closed   
#13: 4  C C->Closed 
#14: 4 Closed   

或压缩选项一个选项

setDT(df)[, brand_chg := c(paste(brand[-.N], brand[-1], sep="->"), ""), id] 
相关问题