2017-08-12 107 views
0

我想按日期范围组织数据框。R按日期范围对数据帧进行子集

我们今天考虑的是2017年1月1日和下面的表显示:

  • 三种类型的产品(苹果,香蕉和啤酒)

  • 五到期日期(1/15/2017年,2017年2月27日,2017年3月15日,2017年9月1日和2018年1月10日)

 
Product Type 1/15/2017 2/27/2017 3/15/2017 9/1/2017 12/20/2017 1/10/2018 
Apple   3   10   -   2   8   - 
Banana   5   50   100   10   10   2 
Beer   1   1   1   1   1   1

你可以阅读上表为“店经理有3个苹果,有效日期为2017年1月15日,其他10个苹果可以持续更长时间,有效期限为2017年2月27日等。“

店长有兴趣知道有多少苹果会在不到1个月,1至3个月,3至12个月和超过12个月内到期。

如何在R中对此进行编码? 结果表是这样的:

 
Product Type  Less than 1mth 1-3mths  3-12 mths  More than 12mths 
Apple   3     10   10    - 
Banana   5     150   20    2 
Beer    1     2    2    1

非常感谢你的帮助!

+0

你有没有试图编码它?你有什么想法? –

回答

0

data.table答案:

library(data.table) 

dt <- data.table(type=c("apple", "banana", "beer"), 
       `2017-01-15`=c(3,5,1), 
       `2017-02-27`=c(10,50,1), 
       `2017-03-15`=c(NA, 100, 1), 
       `2017-09-01`=c(2,10,1), 
       `2017-12-20`=c(8,10,1), 
       `2018-01-10`=c(NA, 2, 1)) 

dt2 <- melt(dt, id.vars=c("type")) 
dt2[, days_until_expires:=as.IDate(variable) - as.IDate("2017-01-01")] 
dt2[, days_until_expires_f:=cut(days_until_expires, c(0, 30, 90, 360, Inf))] 

out1 <- dt2[, list(N=sum(value, na.rm=T)), by=list(type, days_until_expires_f)] 
out2 <- dcast(out1, type ~ days_until_expires_f, value.var="N") 

out2是你的输出。

将来,您可以通过提供完整的最小工作示例(MWE)使用户更轻松地为您提供帮助。有关指导,请参阅here

2

使用函数tidyverselubridate的解决方案。 dt2是最终输出。

dt <- read.table(text = "'Product Type' '1/15/2017' '2/27/2017' '3/15/2017' '9/1/2017' '12/20/2017' '1/10/2018' 
Apple   3   10   -   2   8   - 
       Banana   5   50   100   10   10   2 
       Beer   1   1   1   1   1   1", 
       header = TRUE, stringsAsFactors = FALSE, na.strings = "-") 


library(tidyverse) 
library(lubridate) 

dt2 <- dt %>% 
    gather(Date, Value, -Product.Type) %>% 
    mutate(Date = sub("X", "", Date, fixed = TRUE)) %>% 
    mutate(Date = mdy(Date)) %>% 
    mutate(Day_Diff = Date - mdy("1/1/2017")) %>% 
    mutate(Group = case_when(
    Day_Diff <= 30 ~ "Less than 1mth", 
    Day_Diff <= 90 ~ "1-3mths", 
    Day_Diff <= 361 ~ "3-12 mths", 
    TRUE   ~ "More than 12mths" 
)) %>% 
    group_by(Product.Type, Group) %>% 
    summarise(Value = sum(Value, na.rm = TRUE)) %>% 
    spread(Group, Value) %>% 
    select(`Product Type` = Product.Type, `Less than 1mth`, `1-3mths`, 
     `3-12 mths`, `More than 12mths`) 
0

下面是来自tidyverse的解决方案:

library(tidyverse) 
library(lubridate) 
df <- cbind(c("Apple","Banana","Beer"),data.frame(matrix(c(3,5,1, 
          10,50,1, 
          "na",100,1, 
          2,10,1, 
          8,10,1, 
          "na",2,1), nrow = 3, ncol = 6))) 
colnames(df) <- c("Product_Type","1/15/2017","2/27/2017","3/15/2017", 
        "9/1/2017", "12/20/2017", "1/10/2018") 
df_long <- gather(df, key = date_range, fruit, 
        c("1/15/2017","2/27/2017","3/15/2017", 
        "9/1/2017", "12/20/2017", "1/10/2018"), factor_key = TRUE) 
df_final <- as_tibble(df_long) %>% 
    mutate(date_range = mdy(date_range)) %>% 
    mutate(date_range = date_range - mdy("1/1/2017")) %>% 
    mutate(months = ifelse(date_range >= 361,"More than 12mths", 
         ifelse(between(date_range,90,361), "3-12 mths", 
           ifelse(between(date_range,30,90),"1-3mths", 
             "Less than 1mth")))) %>% 
    group_by(Product_Type,months) %>% 
    summarise(fruit = sum(as.integer(fruit), na.rm = T)) %>% 
    spread(months,fruit)