2016-11-23 82 views
0

我正在重塑我公司的小时注册数据,以适应某种格式。我已将输入修改为如下所示:R:将长格式转换为宽格式填写缺失日期

employee project month day hours 
1   A 16-001  9 9  5 
2   B 16-001  9 29  1 
3   A 16-001  9 3  5 
4   B 16-001  9 28  2 
5   A 16-002  9 8  6 
6   B 16-002  9 9  4 
7   A 16-002 10 25  6 
8   B 16-002 10 21  8 
9   A overig 10 6  6 
10  B overig 10 17  7 
11  A overig 10 9  1 
12  B overig 10 10  7 

#reproducicle data: 
df <- data.frame(employee = rep(c("A","B"),6),project=rep(c("16-001","16-002","overig"), each=4), month=rep(c(9,10),each=6),day=sample(1:30,12,replace=T), hours=sample(1:8,12,replace=T)) 

#Now, I need to move this to a cross table: 
res <- ftable(xtabs(hours~month+employee+project+day, aggregate(hours~month+employee+project+day, data=df, FUN=sum))) 

#And put this cross table in a data.frame (for export to csv) 
library(reshape2) 
df_res <- dcast(as.data.frame(res), as.formula(paste(paste(names(attr(res, "row.vars")), collapse="+"), "~", paste(names(attr(res, "col.vars")))))) 

df_res 

    month employee project 3 6 8 9 10 17 21 25 28 29 
1  9  A 16-001 5 0 0 5 0 0 0 0 0 0 
2  9  A 16-002 0 0 6 0 0 0 0 0 0 0 
3  9  A overig 0 0 0 0 0 0 0 0 0 0 
4  9  B 16-001 0 0 0 0 0 0 0 0 2 1 
5  9  B 16-002 0 0 0 4 0 0 0 0 0 0 
6  9  B overig 0 0 0 0 0 0 0 0 0 0 
7  10  A 16-001 0 0 0 0 0 0 0 0 0 0 
8  10  A 16-002 0 0 0 0 0 0 0 6 0 0 
9  10  A overig 0 6 0 1 0 0 0 0 0 0 
10 10  B 16-001 0 0 0 0 0 0 0 0 0 0 
11 10  B 16-002 0 0 0 0 0 0 8 0 0 0 
12 10  B overig 0 0 0 0 7 7 0 0 0 0 

我不确定这是最好的方式,但现在格式不错。然而,我需要把所有的德日作为列,而不仅仅是我的data.frame中的日子(所以31列,最好是不存在的日期(例如31),其余为0。建议如何获取?

回答

1

我觉得这是一个可以接受的解决方案,它会处理闰年太(加分)。不过趁着tidyr::spread()真好因素填充行为与drop = F,但现在使用功能lubridate::days_in_month() 。只,但流传至今这里,我们去:

library(tidyr) 
library(lubridate) 
library(purrr) 

df$year <- 2016 
df$num_in_month <- ymd(paste(df$year, df$month, df$day)) %>% 
    days_in_month() 

df %>% split(.$month) %>% 
    map(~mutate(., day = factor(day, levels = 1:unique(num_in_month)))) %>% 
    map(~spread(., key = day, value = hours, fill = 0, drop = F)) %>% 
    bind_rows() %>% 
    select(-num_in_month) 

    employee project month year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
1  A 16-001  9 2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 8 0 0 NA 
2  A 16-002  9 2016 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 
3  B 16-001  9 2016 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 NA 
4  B 16-002  9 2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 
5  A 16-002 10 2016 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
6  A overig 10 2016 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
7  B 16-002 10 2016 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
8  B overig 10 2016 0 0 0 0 6 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

干杯

+0

我不知道(+1),但它ð oes没有完全回答这个问题。首先'spread'会抛出一个错误“行重复标识符”,这实际上可能存在于数据中。其次,所有的日期都充满了NA,既有存在的日期(如sep-1),也有日期(sep-31)。 – RHA

+0

啊,我误解了你填写新生的标准。 – Nate

+0

你是否打算让这种行为能够认识闰年? – Nate