2017-08-07 51 views
-1

我在R.如下数据集如何组数据保存标识列不变

date   jobcategory 
2016-01-01  SP  
2016-01-01  DP 
2016-01-01  SP 
2016-01-01  CP 
2016-01-01  DP 
2016-01-01  DP 
2016-01-01  DP 
2016-01-02  SP 
2016-01-02  CP 
2016-01-02  SP 
2016-01-02  CP 
2016-01-02  DP 
2016-01-02  TP 
2016-01-02  DP 
2016-01-02  DP 
2016-01-02  DP 
2016-01-03  SP 
2016-01-03  SP 
2016-01-03  DP 
2016-01-03  DP 
2016-01-03  SP 
2016-01-03  DP 
2016-01-04  CP 
2016-01-04  MP  

我想这组数据的方式来维持日期领域独树一帜,同时获得一个计数第二栏的工作类别如下:

date  jobcategory Count 
2016-01-01  SP  2 
2016-01-02  SP  2 
2016-01-03  SP  3 
2016-01-04  SP  0 

任何帮助将不胜感激。

回答

1

table一个基础R解决方案。

> dat <- as.data.frame(table(dat)) 
> dat <- dat[dat$jobcategory=='SP', ] 
> dat 


     date jobcategory Freq 
13 2016-01-01   SP 2 
14 2016-01-02   SP 2 
15 2016-01-03   SP 3 
16 2016-01-04   SP 0 

数据

dat <- 
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L), .Label = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04" 
), class = "factor"), jobcategory = structure(c(4L, 2L, 4L, 1L, 
2L, 2L, 2L, 4L, 1L, 4L, 1L, 2L, 5L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 
4L, 2L, 1L, 3L), .Label = c("CP", "DP", "MP", "SP", "TP"), class = "factor")), 
.Names = c("date", "jobcategory"), class = "data.frame", row.names = c(NA, -24L)) 
+1

谢谢ADAMM! – Sree

0

我们需要complete失踪组合,然后得到 '计数'

library(tidyverse) 
res <- df %>% 
     mutate(ind = 1) %>% 
     complete(., date, jobcategory, fill = list(ind = 0)) %>% 
     group_by(date, jobcategory) %>% 
     summarise(Count= sum(ind)) %>% 
     arrange(jobcategory, date) 
res %>% 
    filter(jobcategory == "SP") 
# A tibble: 4 x 3 
# Groups: date [4] 
#  date jobcategory Count 
#  <date>  <chr> <dbl> 
#1 2016-01-01   SP  2 
#2 2016-01-02   SP  2 
#3 2016-01-03   SP  3 
#4 2016-01-04   SP  0 
0

从基础R一行代码就可以,

sapply(unique(df$date), function(i) 
         length(df$jobcategory[df$jobcategory == 'SP' & i == df$date])) 

#[1] 2 2 3 0