我们可以使用data.table
。将'data.frame'转换为'data.table'(setDT(df1)
),按'ID'分组,获得行索引(.I
),其中唯一的'Months'等于唯一'Months'的数量整个数据集和子集在此基础上
library(data.table)
setDT(df1)[df1[, .I[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1]
# ID Month
# 1: 4 Jan
# 2: 4 Feb
# 3: 4 Mar
# 4: 4 Apr
# 5: 4 May
# 6: 4 Jun
# 7: 6 Jan
# 8: 6 Jan
# 9: 6 Feb
#10: 6 Mar
#11: 6 Apr
#12: 6 May
#13: 6 Jun
数据要提取“ID的
setDT(df1)[, ID[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1
#[1] 4 6
或用base R
1)使用table
与rowSums
v1 <- rowSums(table(df1) > 0)
names(v1)[v1==max(v1)]
#[1] "4" "6"
此信息可用于子集划分的数据
subset(df1, ID %in% names(v1)[v1 == max(v1)])
2)使用tapply
lst <- with(df1, tapply(Month, ID, FUN = unique))
names(which(lengths(lst) == length(unique(df1$Month))))
#[1] "4" "6"
或美国荷兰国际集团dplyr
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(n_distinct(Month)== n_distinct(df1$Month)) %>%
.$ID %>%
unique
#[1] 4 6
,或者如果我们需要得到各行
df1 %>%
group_by(ID) %>%
filter(n_distinct(Month)== n_distinct(df1$Month))
# A tibble: 13 x 2
# Groups: ID [2]
# ID Month
# <int> <chr>
# 1 4 Jan
# 2 6 Jan
# 3 6 Jan
# 4 4 Feb
# 5 6 Feb
# 6 4 Mar
# 7 6 Mar
# 8 4 Apr
# 9 6 Apr
#10 4 May
#11 6 May
#12 4 Jun
#13 6 Jun