2016-11-16 71 views
2

我想在每个Id级别出现第一个“C”之前计算“I”的出现次数。我已经试过这段代码,但可以计算列中出现的所有“I”。 代码我试过在特定字母之前计算字母表的出现

library(plyr) 
Impres = ddply(df, .(Id), summarize, No_of_I_before_First_C = length(which(Character == "I"))) 

的样本数据

Id Character 
1  I 
1  I 
1  C 
1  I 
2  I 
2  C 

输出应该是这样的

Id Count_Of_I_before_First_C 
1  2 
2  1 

回答

0

这里有一个想法,

first1 <- function(x, letter){ 
      which(x == letter)[1]-1 
      } 

aggregate(Character ~ Id, df, first1, 'C') 
# Id Character 
#1 1   2 
#2 2   1 

要概括它多一点,

first1 <- function(x, letter, letter_count){ 
    ind <- which(x == letter)[1] 
    sum(grepl(letter_count, x[1:ind])) 
    } 

aggregate(Character ~ Id, df, first1, 'C', 'I') 
# Id Character 
#1 1   2 
#2 2   1 
+0

这将是相当大的数据集 – Bulat

+1

慢@Bulat我只是跟随'的问题agregate'标签(即没有包)。我知道'dplyr'和'data.table'都有更高效的方法 – Sotos

0
require(dplyr) 
require(magrittr) 
df <- data.frame(Id = c(1,1,1,1,2,2), Character = c('I', 'I', 'C', 'I', 'I', 'C')) 

这个功能会给你我的数第一个C

foo <- function (character) { 

    is_before_C <- (character == 'C') %>% cummax() %>% not() 
    # is_before_C <- !cummax(character == 'C') # the same 
    is_I <- character == 'I' 
    is_I_before_C <- is_I & is_before_C 

    return(sum(is_I_before_C)) 
} 

之前,然后你就可以使用这个功能来汇总数据

df %>% 
    group_by(Id) %>% 
    summarise(Count_Of_I_before_First_C = foo(Character)) 

结果:

# A tibble: 2 × 2 
    Id Count_Of_I_before_First_C 
    <dbl>      <int> 
1  1       2 
2  2       1 
0

这里是data.table解决方案:

library(data.table) 
dt <- data.table(Id = c(1,1,1,1,2,2), Character = c('I', 'I', 'C', 'I', 'I', 'C')) 
dt[, cnt.c := cumsum(Character == "C"), by = Id] 
res <- dt[cnt.c == 0, .(Count_Of_I_before_First_C = length(Character)), by = Id] 
0

也许:

library(dplyr) 

rlei <- function(x) { 
    r <- rle(x) 
    I <- which(r$values=="I") 
    C <- which(r$values=="C") 
    r$lengths[which(I<C)][1] 
} 

group_by(df, Id) %>% 
    summarise(Count_Of_I_before_First_C=rlei(.$Character))