2017-06-13 179 views
4

我想要计算大型数据集内每个工作日每间隔15分钟开始的会话数。R:计数时间间隔为15分钟

我的数据是这样的:

df <- 

Start_datetime  End_datetime  Duration Volume 
2016-04-01 06:20:55 2016-04-01 14:41:22 08:20:27 8.360 
2016-04-01 08:22:27 2016-04-01 08:22:40 00:00:13 0.000 
2016-04-01 08:38:53 2016-04-01 09:31:58 00:53:05 12.570 
2016-04-01 09:33:57 2016-04-01 12:37:43 03:03:46 7.320 
2016-04-01 10:05:03 2016-04-01 16:41:16 06:36:13 9.520 
2016-04-01 12:07:57 2016-04-02 22:22:32 34:14:35 7.230 
2016-04-01 16:56:55 2016-04-02 10:40:17 17:43:22 5.300 
2016-04-01 17:29:18 2016-04-01 19:50:29 02:21:11 7.020 
2016-04-01 17:42:39 2016-04-01 19:45:38 02:02:59 2.430 
2016-04-01 17:47:57 2016-04-01 20:26:35 02:38:38 8.090 
2016-04-01 22:00:15 2016-04-04 08:22:21 58:22:06 4.710 
2016-04-02 01:12:38 2016-04-02 09:49:00 08:36:22 3.150 
2016-04-02 01:32:00 2016-04-02 12:49:47 11:17:47 5.760 
2016-04-02 07:28:48 2016-04-04 06:58:56 47:30:08 0.000 
2016-04-02 07:55:18 2016-04-05 07:55:15 71:59:57 0.240 

我想计算每个15分钟所有起始会话开始,其中:周末

For business days 
    Time    PTU Count 
    00:00:00 - 00:15:00 1  10  #(where count is the amount of sessions started between 00:00:00 and 00:15:00) 
    00:15:00 - 00:30:00 2  6 
    00:30:00 - 00:45:00 3  5 
    00:45:00 - 01:00:00 3  3 

等及相同的数据。

我曾尝试切换功能:

df$PTU <- table (cut(df$Start_datetime, breaks="15 minutes")) 
data.frame(PTU) 

编辑:当我运行此我收到以下错误:

Error in cut.default(df$Start_datetime, breaks = "15 minutes") :'x' must be numeric 

并与lubridate某些功能,但我似乎无法到让它起作用。我的最终目标是创建一个类似于以下的表格,但间隔15分钟。
enter image description here

+1

你能解释为什么'cut'的做法是行不通的 – akrun

+1

你能'dput'数据位? –

+1

如果您在寻找营业日,请查看[这里](https://cran.r-project.org/web/packages/bizdays/bizdays.pdf) – akrun

回答

1

还有,你必须谨记的日期时间使用cut时,两件事情:

  1. 确保您的数据实际上是一个POSIXt类。我很确定你的不是,或者R不会使用cut.default,而是使用cut.POSIXt作为方法。
  2. "15 minutes"应该是"15 min"。见?cut.POSIXt

所以此工程:

Start_datetime <- as.POSIXct(
    c("2016-04-01 06:20:55", 
    "2016-04-01 06:22:12", 
    "2016-04-01 05:30:12") 
) 

table(cut(Start_datetime, breaks = "15 min")) 
# 2016-04-01 05:30:00 2016-04-01 05:45:00 2016-04-01 06:00:00 2016-04-01 06:15:00 
#     1     0     0     2 

请注意,输出给你15分钟时间间隔的开始作为表的名称。

1

这是一种从日期时间“字符串”到所需格式的完整过程。开始是一个字符串向量:

Start_time <- 
c("2016-04-01 06:20:55", "2016-04-01 08:22:27", "2016-04-01 08:38:53", 
    "2016-04-01 09:33:57", "2016-04-01 10:05:03", "2016-04-01 12:07:57", 
    "2016-04-01 16:56:55", "2016-04-01 17:29:18", "2016-04-01 17:42:39", 
    "2016-04-01 17:47:57", "2016-04-01 22:00:15", "2016-04-02 01:12:38", 
    "2016-04-02 01:32:00", "2016-04-02 07:28:48", "2016-04-02 07:55:18" 
) 
df <- data.frame(Start_time) 

,这是一个实际的处理

## We will use two packages 
library(lubridate) 
library(data.table) 

# convert df to data.table, parse the datetime string 
setDT(df)[, Start_time := ymd_hms(Start_time)] 
# floor time by 15 min to assign the appropriate slot (new variable Start_time_slot) 
df[, Start_time_slot := floor_date(Start_time, "15 min")] 

# aggregate by wday and time in a date 
start_time_data_frame <- df[, .N, by = .(wday(Start_time_slot), format(Start_time_slot, format="%H:%M:%S"))] 

# output looks like this 
start_time_data_frame 
##  wday  time N 
## 1: 6 06:15:00 1 
## 2: 6 08:15:00 1 
## 3: 6 08:30:00 1 
## 4: 6 09:30:00 1 
## 5: 6 10:00:00 1 
## 6: 6 12:00:00 1 
## 7: 6 16:45:00 1 
## 8: 6 17:15:00 1 
## 9: 6 17:30:00 1 
## 10: 6 17:45:00 1 
## 11: 6 22:00:00 1 
## 12: 7 01:00:00 1 
## 13: 7 01:30:00 1 
## 14: 7 07:15:00 1 
## 15: 7 07:45:00 1 
相关问题