与分组一些特征子集的意见

我有一个数据集象下面这样：与分组一些特征子集的意见

date,  time,product,shop_id 

20140104 900 Banana 18 
20140104 900 Banana 19 
20140104 924 Banana 18 
20140104 929 Banana 18 
20140104 932 Banana 20 
20140104 948 Banana 18

，我需要与不同product提取的意见，和不同shop_id

所以，我需要组观察由product+shop_id

这里是我的代码：

library(plyr) 
    d_ply(shop, .(product,shop_id ),table ) 
print(p)

不幸的是，它打印null

数据集：

date=c(20140104,20140104,20140104,20140104,20140104) 
time=c(924 ,900,854,700,1450) 
product=c(Banana ,Banana ,Banana ,Banana ,Banana) 
shop_id=c(18,18,18,19,20) 
shop<-data.frame(date=date,time=time,product=product,shop_id=shop_id)

输出应该是

  date, time, product, shop_id 


     20140104 900 Banana 19 
     20140104 932 Banana 20 
     20140104 948 Banana 18

来源

2017-03-01 user5363938

什么是'time' 948和932 –

选择给定的行他们有diferent'shop_id'逻辑。每个选定的观察应该有独特的产品或shop_id，或两者都 – user5363938

但为什么你选择时间948而不是900当从商店18香蕉？ – ira

我们可以做

library(tidyverse) 
shop %>% 
    group_by(product, shop_id) %>% 
    mutate(n = n()) %>% 
    group_by(time) %>% 
    arrange(n) %>% 
    slice(1) %>% 
    group_by(product, shop_id) %>% 
    arrange(-time) %>% 
    slice(1) %>% 
    select(-n) %>% 
    arrange(time) 
#  date time product shop_id 
#  <int> <int> <chr> <int> 
#1 20140104 900 Banana  19 
#2 20140104 932 Banana  20 
#3 20140104 948 Banana  18

来源

2017-03-01 08:55:42 akrun

我希望人们停止纵容这种导入未使用的库的反模式。明确。 –

为了仅取第一个独特的组合，只需使用aggregate从包stats：

> aggregate(shop, by=list(shop$product, shop$shop_id), FUN=function(x){x[1]}) 

Group.1 Group.2  date time product shop_id 
1 Banana  18 20140104 924 Banana  18 
2 Banana  19 20140104 700 Banana  19 
3 Banana  20 20140104 1450 Banana  20

说明：我FUN=function(x){x[1]}仅需第一元件在碰撞

的情况下

要删除 “Group.1”， “Group.2” 或其他列：

> res <- aggregate(shop, by=list(shop$product, shop$shop_id), FUN=function(x){x[1]}) 
> res[ , !(names(res) %in% c("Group.1", "Group.2"))] 
     date time product shop_id 
1 20140104 924 Banana  18 
2 20140104 700 Banana  19 
3 20140104 1450 Banana  20

PS您提供的数据集与您所需的示例不一致，所以这就是为什么数字有所不同。

PS2如果你想在碰撞的情况下，所有的数据：

> aggregate(shop, by=list(shop$product, shop$shop_id), FUN="identity") 
    Group.1 Group.2       date   time product shop_id 
1 Banana  18 20140104, 20140104, 20140104 924, 900, 854 1, 1, 1 18, 18, 18 
2 Banana  19      20140104   700  1   19 
3 Banana  20      20140104   1450  1   20

如果你想标记的碰撞：

> aggregate(shop, by=list(shop$product, shop$shop_id), FUN=function(x){if (length(x) > 1) NA else x}) 
    Group.1 Group.2  date time product shop_id 
1 Banana  18  NA NA  NA  NA 
2 Banana  19 20140104 700  1  19 
3 Banana  20 20140104 1450  1  20

如果要排除非唯一行：

> res <- aggregate(shop, by=list(shop$product, shop$shop_id), FUN=function(x){if (length(x) > 1) NULL else x}) 

> res[res$product != "NULL", !(names(res) %in% c("Group.1", "Group.2"))] 
     date time product shop_id 
2 20140104 700  1  19 
3 20140104 1450  1  20

如果要避免从字符串转换为Int（对于产品），请使用“”/“NULL”/“NA”而不是NULL/NA。

来源

2017-03-01 08:58:41 dk14

它可以通过dplyr如下进行：

# create the sample dataset 
date=c(20140104,20140104,20140104,20140104,20140104) 
time=c(924 ,900,854,700,1450) 
product=c("Banana","Banana","Banana","Banana","Banana") 
shop_id=c(18,18,18,19,20) 
shop<-data.frame(date=date,time=time,product=product,shop_id=shop_id) 

# load a dplyr library 
library(dplyr) 

# take shop data 
shop %>% 
     # group by product, shop id, date 
     group_by(product, shop_id, date) %>% 
     # for each such combination, find the earliest time 
     summarise(time = min(time)) %>% 
     # group by product, shop id 
     group_by(product, shop_id) %>% 
     # for each combination of product & shop id 
     # return the earliest date and time recorded on the earliest date 
     summarise(date = min(date), time = time[date == min(date)])

来源

2017-03-01 10:08:36 ira

与分组一些特征子集的意见

回答

相关问题