2014-11-21 184 views
-2

我有一个数据帧,我想删除所有重复的行。比如我的数据框的样子:R删除重复行

> df <- data.frame(A = c("Happy", "Happy", "Sad", "Confused", "Mad", "Mad"), B = c(1, 2, 3, 4, 5, 6)) 
> df 
     A B 
1 Happy 1 
2 Happy 2 
3  Sad 3 
4 Confused 4 
5  Mad 5 
6  Mad 6 

我只想要其中一个条目都是唯一的行获得:

  A B 
1  Sad 3 
2 Confused 4 

回答

4

您可以尝试duplicated

df[!(duplicated(df$A)|duplicated(df$A,fromLast=TRUE)),] 
#   A B 
#3  Sad 3 
#4 Confused 4 

df[df$A %in% with(as.data.frame(table(df$A)), Var1[Freq==1]),] 
#  A B 
#3  Sad 3 
#4 Confused 4 

df[colSums(sapply(df$A, `==`, df$A))==1,] 
#   A B 
#3  Sad 3 
#4 Confused 4 

df[colSums(Vectorize(function(x) x==df$A)(df$A))==1,] 

或使用data.table(类似@初学者的使用ave

library(data.table) 
setDT(df)[,.SD[.N==1], by=A] 
#   A B 
#1:  Sad 3 
#2: Confused 4 

setDT(df)[df[,.I[.N==1], by=A]$V1] 
#   A B 
#1:  Sad 3 
#2: Confused 4 
3

akrun似乎被各色收集NT方法,所以这里的另一个在基地:

df[ave(as.numeric(df$A), df$A, FUN = length) == 1,] 
#   A B 
#3  Sad 3 
#4 Confused 4 

(我猜一个与duplicated将是最常用的方法)

或者使用dplyr:

require(dplyr) 
group_by(df, A) %>% filter(n() == 1) 
+0

我等着'dplyr'的答案。你怎么这么久? ;-) – 2014-11-21 18:42:02

+0

感谢您的评论:) – 2014-11-22 12:08:41