2014-11-03 83 views
0

希望你们能帮助我。我一直在寻找网络,我找不到答案。 这里是我的数据帧:如何使用ddply从数据框中删除na值?

name city state stars main_category 
A Pittsburgh PA  5.0  Soul Food 
B Houston  TX  3.0  Professional Services 
C Lafayette IN  3.0  NA 
D Los Angeles CA  4.0  Local Services 
E Los Angeles CA  3.0  Local Services 
F Lafayette IN  3.5  Mongolian 
G Pittsburgh PA  5.0  Doctors 
H Pittsburgh PA  4.0  Soul Food 
I Houston  TX  4.0  Professional Services 

我想它做的是通过输出分组城市(按字母顺序)与国家的排名,然后按排名的明星得到的量。这是我所希望的:

name city state stars main_category    rank 
I Houston  TX  4.0  Professional Services  1 
B Houston  TX  3.0  Professional Services  2 
F Lafayette IN  3.5  Mongolian     1 
D Los Angeles CA  4.0  Local Services    1 
E Los Angeles CA  3.0  Local Services    2 
G Pittsburgh PA  5.0  Doctors      1 
A Pittsburgh PA  5.0  Soul Food     1 
H Pittsburgh PA  4.0  Soul Food     2 

这是我的代码行。

l <- ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max")) 

这并不能消除拉斐特所具有的NA。我不知道该放什么,我也尝试过na.omit,但是当我尝试这个时,排名列没有出现。

+1

1)使[重复的例子(http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) 。 2)禁止,试试这个 - 'ddply(na.omit(d),...)' – Chase 2014-11-03 02:48:53

+0

但是休斯敦没有得到5星。我很困惑你的输出 – 2014-11-03 02:55:48

+0

@Chase我试过na.omit(d),这就是我得到的:错误:尝试应用非功能 – 2014-11-03 02:56:01

回答

1

这是一个基本的R解决方案。不知道你是否使用dplyr,但这似乎工作。我认为最后一行应排在第3,因为有在1

no <- na.omit(dat) 
new <- no[do.call(order, with(no, list(city, state, -stars))),] 
within(new, { 
    rank <- Reduce(c, Map(rank, split(-stars, city), ties.method = "min")) 
}) 
# name  city state stars   main_category rank 
# 9 I  Houston TX 4.0 Professional Services 1 
# 2 B  Houston TX 3.0 Professional Services 2 
# 6 F Lafayette IN 3.5    Mongolian 1 
# 4 D Los Angeles CA 4.0  Local Services 1 
# 5 E Los Angeles CA 3.0  Local Services 2 
# 1 A Pittsburgh PA 5.0    Soul Food 1 
# 7 G Pittsburgh PA 5.0    Doctors 1 
# 8 H Pittsburgh PA 4.0    Soul Food 3 
0

排名的两个第一值使用dplyr

library(dplyr) 
filter(dat, complete.cases(dat)) %>% 
           group_by(city) %>% 
           arrange(city, state, desc(stars)) %>% 
           mutate(rank= min_rank(desc(stars))) 
# name  city state stars   main_category rank 
#1 I  Houston TX 4.0 Professional Services 1 
#2 B  Houston TX 3.0 Professional Services 2 
#3 F Lafayette IN 3.5    Mongolian 1 
#4 D Los Angeles CA 4.0  Local Services 1 
#5 E Los Angeles CA 3.0  Local Services 2 
#6 A Pittsburgh PA 5.0    Soul Food 1 
#7 G Pittsburgh PA 5.0    Doctors 1 
#8 H Pittsburgh PA 4.0    Soul Food 3 
0

na.rm与ddply走了进去.fun,在情况是在内部排名。

的方法给NA的情况如下:

ddply(d,C( “城市”, “国家”, “main_category”),na.rm = T,变换,秩秩=( - stars,tie.method =“max”))

传递.fun中的参数,应该修复它。至少它为我的作品:

ddply(d, c("city", "state", "main_category"), transform, 
rank=rank(-stars, na.last = TRUE, ties.method="max"))