2016-02-29 225 views
2

我正在尝试在数据框上按行操作(按组)。当我为一个组自己运行函数时,它运行得非常好。然而,当我把函数放在ddply中为所有组运行时,它会抛出一个错误 - 参数长度为零。函数在ddply之外完美工作,但在ddply中抛出一个错误

当数据帧 '测试' 上单独运行的功能:

for (i in 1:(nrow(test) - 5)) { 

    if (i <= 5) { 
    test[i, "MPPALERT"] <- 0  
    } 

    FIRSTMPP <- test[i, "TAGMPPSEARCHCOUNT"] 
    LASTMPP <- test[i+5, "TAGMPPSEARCHCOUNT"] 

    if ((LASTMPP - FIRSTMPP) >= 10) { 
    test[i+5, "MPPALERT"] <- 1  
    } else { 
    test[i+5, "MPPALERT"] <- 0  
    } 

} 

内部ddply上述功能引发错误:

Error in if (LASTMPP - FIRSTMPP >= 10) { : argument is of length zero 

下面是ddply代码:

mpp_fn <- function(x) { 

    for (i in 1:(nrow(x) - 5)) { 

    if (i <= 5) { 
     x[i, "MPPALERT"] <- 0  
    } 

    FIRSTMPP <- x[i, "TAGMPPSEARCHCOUNT"] 
    LASTMPP <- x[i+5, "TAGMPPSEARCHCOUNT"] 

    if (LASTMPP - FIRSTMPP >= 10) { 
     x[i+5, "MPPALERT"] <- 1  
    } else { 
     x[i+5, "MPPALERT"] <- 0  
    } 

    } 

} 

result <- ddply(data, c("SHELTERID", "INVERTERID"), mpp_fn(x)) 

在上面的代码中,FIRSTMPP和LASTMPP的值解析为NULL,因此错误r,但为什么会发生这种情况(当它在ddply以外完美运行时)?

UPDATE:这里是dput的输出(数据):

structure(list(SHELTERID = c("SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", "SH02", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", "SH03", 
"SH03", "SH03", "SH03", "SH03"), INVERTERID = c("I1", "I1", "I1", 
"I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", 
"I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2", "I2", 
"I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", 
"I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", "I1", 
"I1", "I1", "I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", 
"I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", "I2", 
"I2", "I2", "I2"), TAGMPPSEARCHCOUNT = c(0, 0, 0, 0, 0, 0, 0, 
2, 0, 0, 3, 0, 0, 3, 0, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 4, 0, 0, 4, 0, 4, 0, 0, 5, 0, 0, 
5, 0)), .Names = c("SHELTERID", "INVERTERID", "TAGMPPSEARCHCOUNT" 
), row.names = c(350L, 351L, 352L, 353L, 354L, 355L, 356L, 357L, 
358L, 359L, 360L, 361L, 362L, 363L, 364L, 365L, 366L, 367L, 494L, 
495L, 496L, 497L, 498L, 499L, 500L, 501L, 502L, 503L, 504L, 505L, 
506L, 507L, 508L, 509L, 510L, 511L, 638L, 639L, 640L, 641L, 642L, 
643L, 644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L, 652L, 653L, 
654L, 655L, 782L, 783L, 784L, 785L, 786L, 787L, 788L, 789L, 790L, 
791L, 792L, 793L, 794L, 795L, 796L, 797L, 798L, 799L), class = "data.frame") 
+1

可以通过发布'dput(data)'的输出来提供可重复的示例。请不要使用完整的“数据”,但只能使用它的最小子集 – Thierry

+0

Sure @Thierry。以下是数据的一小部分。 – Ankur

+0

请将'dput(data)'的输出添加到'data'中。 'dput(data)'的输出使得将你的数据复制到R会话中非常容易。 – Thierry

回答

0

这里是一个dplyr溶液。它不需要显式循环

library(dplyr) 
data %>% 
    group_by(SHELTERID, INVERTERID) %>% 
    mutate(
    First = lag(TAGMPPSEARCHCOUNT, 5), 
    MPPALERT = ifelse(
     is.na(First), 
     0, 
     ifelse(
     TAGMPPSEARCHCOUNT - First > 10, 
     1, 
     0 
    ) 
    ) 
) 
+0

非常感谢@Thierry。这是工作! – Ankur

相关问题