子集数据帧

我有一个数据帧，看起来像以下：子集数据帧

df <- data.frame(Site=rep(paste0('site', 1:5), 50), 
      Month=sample(1:12, 50, replace=T), 
      Count=(sample(1:1000, 50, replace=T)))

我想删除任何网站，伯爵总是在所有站点每月最大计数< 5％。

在所有站点中的最大月度计数：

library(plyr) 
ddply(df, .(Month), summarise, Max.Count=max(Count))

如果计数1被分配到site5，则其数总是在所有站点每月最大计数的< 5％。所以我想要site5删除。

df$Count[df$Site=='site5'] <- 1

然而，SITE2分配新值后，它的一些罪名是最大月度计数< 5％，而另一些则> 5％。所以我不想要site2删除。

df$Count[df$Site=='site2'] <- ceiling(seq(1, 1000, length.out=20))

我怎么能子集数据框中删除其中数总是最大月度计数< 5％的任何网站？如果问题不清楚，我会修改。

来源

2013-03-14 luciano

这里有一个plyr解决方案：

## df2$test is true if Count >= max(Count)*0.05 for this month 
df2 <- ddply(df, .(Month), transform, test=Count>=(max(Count)*0.05)) 
## For each site, test$keep is true if at least one count is >= max(Count)*0.05 for this month 
test <- ddply(df2, .(Site), summarise, keep=sum(test)>0) 
## Subsetting 
sites <- test$Site[test$keep] 
df[df$Site %in% sites,]

来源

2013-03-14 11:36:15 juba

一个data.table解决方案：

require(data.table) 
set.seed(45) 
df <- data.frame(Site=rep(paste0('site', 1:5), 50), 
     Month=sample(1:12, 50, replace=T), 
     Count=(sample(1:1000, 50, replace=T))) 
df$Count[df$Site=='site5'] <- 1 

dt <- data.table(df, key=c("Month", "Site")) 
# set max.count per site+month 
dt[, max.count := max(Count), by = list(Month)] 
# get the site that is TRUE for all months it is present 
d1 <- dt[, list(check = all(Count < .05 * max.count)), by = list(Month, Site)] 
sites <- as.character(d1[, all(check == TRUE), by=Site][V1 == TRUE, Site]) 

dt.out <- dt[Site != sites][, max.count := NULL] 
#  Site Month Count 
# 1: site1  1 939 
# 2: site1  1 939 
# 3: site1  1 939 
# 4: site1  1 939 
# 5: site1  1 939 
# ---     
# 196: site2 12 969 
# 197: site2 12 684 
# 198: site2 12 613 
# 199: site2 12 969 
# 200: site2 12 684

来源

2013-03-14 11:22:29 Arun

于是删除所有行的网站，计数<5％最大计数一月，<2月份5％的最大计数， 3月的最大月数<5％....每年的每个月。不要删除任何行，例如，除6月以外每月最多计数<5％。 – luciano 2013-03-14 11:35:15

@RossAhmed，这应该做到这一点。 – Arun 2013-03-14 12:02:41

回答

相关问题