如何让这个循环更有效率？

我有一个数据帧，看起来像这样：如何让这个循环更有效率？

user1,product1,0 
user1,product2,2 
user1,product3,1 
user1,product4,2 
user2,product3,0 
user2,product2,2 
user3,product4,0 
user3,product5,3

数据帧有数百万行。我需要遍历每一行，如果最后一列的值为0，则保留该产品编号，否则将产品编号附加到value = 0的前一产品编号，然后写入新的数据框。

例如，所产生的基质应

user1,product1 
user1,product1product2 
user1,product1product3 
user1,product1product4 
user2,product3 
user2,product3product2 
user3,product4 
user3,product4product5

我写了一个for循环要经过的每一行，和它的作品，但非常非常慢。我如何加快速度？我试图对它进行矢量化，但我不确定是怎么回事，因为我需要检查前一行的值。

来源

2011-11-24 yzhang

请注意，您确实没有矩阵。矩阵只能包含一个原子类型（数字，整数，字符等）。你真的有一个data.frame。

你想要做的事情可以很容易地从动物园包和ifelse函数na.locf完成。

x <- structure(list(V1 = c("user1", "user1", "user1", "user1", "user2", 
"user2", "user3", "user3"), V2 = c("product1", "product2", "product3", 
"product4", "product3", "product2", "product4", "product5"), 
    V3 = c("0", "2", "1", "2", "0", "2", "0", "3")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, 8L)) 

library(zoo) 
# First, create a column that contains the value from the 2nd column 
# when the 3rd column is zero. 
x$V4 <- ifelse(x$V3==0,x$V2,NA) 
# Next, replace all the NA with the previous non-NA value 
x$V4 <- na.locf(x$V4) 
# Finally, create a column that contains the concatenated strings 
x$V5 <- ifelse(x$V2==x$V4,x$V2,paste(x$V4,x$V2,sep="")) 
# Desired output 
x[,c(1,5)]

由于您使用的是data.frame，你需要确保“产品”列字符，而不是因子（上面的代码将会给奇怪的结果，如果“产品”列因素）。

来源

2011-11-24 13:30:47

如何让这个循环更有效率？

回答

相关问题