groupby并删除r数据框中的最低值

我基本上想要从数据框中删除重复项，并将列中的最低值保留在由两列（名称和集群）分组的列中。举例来说，在这里，如果我的数据框：groupby并删除r数据框中的最低值

 Name cluster score 
19  Steve a1  30 
51  Steve a2  30 
83  Steve a2  -28 
93  Steve a2  -38 
115 Bob  a4  30 
147 Bob  a5  -8 
179 Bob  a5  30

在大熊猫和SQL这将是一个GROUPBY做，但我竭力要弄清楚在R和真的连上手。我试着做一个双重名称和集群。第一个groupby是Name，然后是cluster。所以既然有三个'史蒂夫，a2'我只想保留最低分的那个。

我的期望的输出将是以下：

 Name cluster score 
19  Steve a1  30 
93  Steve a2  -38 
115 Bob  a4  30 
147 Bob  a5  -8

任何帮助，将不胜感激

来源

2014-08-31 WycG

这个作品

library(dplyr) 


Name=c("Steve", "Steve", "Steve", "Steve", "Bob", "Bob", "Bob") 
cluster=c("a1", "a2", "a2", "a2", "a4", "a5", "a5") 
score=c(30,30,-28,-38,30,-8,30) 
yourdf<-data.frame(Name,cluster,score) 

yourdf %>% 
    group_by(Name,cluster) %>% 
    filter(score == min(score)) 

    Name cluster score 
1 Steve  a1 30 
2 Steve  a2 -38 
3 Bob  a4 30 
4 Bob  a5 -8

来源

2014-08-31 03:09:05 jalapic

这里是一个基础R的方法：

# Read in sample data 
df<-read.table(text=" 
     Name cluster score 
19  Steve a1  30 
51  Steve a2  30 
83  Steve a2  -28 
93  Steve a2  -38 
115 Bob  a4  30 
147 Bob  a5  -8 
179 Bob  a5  30", header=TRUE) 

# order it 
df_sorted <- df[with(df, order(Name, cluster, score)),] 

# get rid of duplicated names and clusters, keeping the first, 
# which will be the minimum score due to the sorting. 

df_sorted[!duplicated(df_sorted[,c('Name','cluster')]), ] 
#  Name cluster score 
#115 Bob  a4 30 
#147 Bob  a5 -8 
#19 Steve  a1 30 
#93 Steve  a2 -38

来源

2014-08-31 03:37:51 Jota

和一个简单的data.table解决方案

library(data.table) 
setDT(df)[, list(score = score[which.min(score)]), by = list(Name, cluster)] 
#  Name cluster score 
# 1: Steve  a1 30 
# 2: Steve  a2 -38 
# 3: Bob  a4 30 
# 4: Bob  a5 -8

来源

2014-08-31 11:17:18

这对于aggregate是完美的。

> aggregate(score ~ Name + cluster, mydf, min) 
# Name cluster score 
# 1 Steve  a1 30 
# 2 Steve  a2 -38 
# 3 Bob  a4 30 
# 4 Bob  a5 -8

其中mydf是您的原始数据。

来源

2014-08-31 11:53:31

groupby并删除r数据框中的最低值

回答

相关问题