2016-04-14 41 views
0

我对格式的数据R中的数据: -滚动的客户ID和产品水平

id  product mcg  txn 
101 gold  hotel 1 
101  gold  hotel 2 
101  clas  hotel 22 
101  clas  airline 23 

我想要的输出

  hotel_txn airline_txn 
101 gold 3    . 
101 clas 22    23 

任何人都可以请其帮助我期望的输出?

基本上,我正在寻找一个替代案例时,SAS语句?

+0

尝试'库(data.table); dcast(setDT(df1),id + product〜mcg,value.var =“txn”,sum)' – akrun

回答

2

我们可以使用xtabs

xtabs(txn~idprod + mcg, transform(df1, idprod = paste(id, product), 
       mcg = paste0(mcg, "_txn"))) 
#   mcg 
#idprod  airline_txn hotel_txn 
# 101 clas   23  22 
# 101 gold   0   3 
0

您可以使用dplyrtidyr做到这一点:

library(dplyr) 
library(tidyr) 
df %>% group_by(id, product, mcg) %>% summarise(txn = sum(txn)) %>% spread(mcg, txn) 
Source: local data frame [2 x 4] 
Groups: id, product [2] 

    id product airline hotel 
    <int> <fctr> <int> <int> 
1 101 clas  23 22 
2 101 gold  NA  3 
+0

是给出MCG列不存在的错误。可以帮忙吗? –

1

Reshape2的dcast功能是专为这种东西:

#creates your data frame 
df <- data.frame(id = c(101, 101, 101, 101), 
       product = c("gold", "gold", "clas", "clas"), 
       mcg = c("hotel", "hotel", "hotel", "airline"), 
       txn = c(1, 2, 22, 23)) 

#installs and loads the required package 
install.packages("reshape2") 
library(reshape2) 

#the function you would use to create the new data frame 
df2 <- dcast(df, id + product ~ mcg, value.var = "txn", sum) 

print(df2) 
    id product airline hotel 
1 101 clas  23 22 
2 101 gold  0  3 
+0

id产品航空公司酒店黄金类 1 101等级23 22 3 45 CAn我们得到这种形式的数据? –

+0

@ankitagarwal您能否澄清您的要求?我不明白你在评论中要求什么。 – bshelt141