2017-04-10 41 views
0

我想要一个直方图的计数我有一些数据。数据在时间上不是等间隔的(即可能有几天缺失)。我可以创建使用如何在geom_histogram中每月有一个垃圾箱?

ym_plot <- ggplot(data = df %>% mutate(timestamp = as.POSIXct(timestamp)), aes(timestamp)) + 
      geom_histogram(aes(fill = ..count..)) 
print(ym_plot) 

但是直方图,每年有8个垃圾箱,所以箱做映射到几个月。有没有简单的方法将垃圾箱设置为一个月?如果数据从一年开始,我会做12*number_of_months

编辑:

下面是一个简单

[1] "2013-07-15 22:12:43 EST" 
[1] "2013-05-04 21:30:06 EST" 
[1] "2017-01-02 02:28:02 EST" 
[1] "2013-02-28 08:06:09 EST" 
[1] "2011-11-10 13:57:16 EST" 
[1] "2015-11-12 21:05:37 EST" 
[1] "2011-10-31 13:02:21 EST" 
[1] "2015-01-18 12:22:45 EST" 
[1] "2013-02-04 11:57:41 EST" 
[1] "2011-10-16 21:54:27 EST" 
[1] "2013-06-19 23:11:45 EST" 
[1] "2015-08-16 19:26:29 EST" 
[1] "2016-11-09 21:48:20 EST" 
[1] "2011-06-13 13:30:19 EST" 
[1] "2012-05-08 02:50:42 EST" 
[1] "2014-10-15 23:27:28 EST" 
[1] "2012-03-11 00:56:05 EST" 
[1] "2014-07-16 17:32:34 EST" 
[1] "2011-08-08 19:01:39 EST" 
[1] "2014-08-31 13:41:49 EST" 
[1] "2017-03-09 23:23:45 EST" 
[1] "2013-02-16 13:27:49 EST" 
[1] "2012-08-22 23:58:33 EST" 
[1] "2012-04-20 11:06:32 EST" 
[1] "2016-01-22 20:50:30 EST" 
+0

@ulfelder请参阅编辑 –

回答

0

一些想法是从this question拍摄。

require(ggplot2) 
require(scales) 

df <- data.frame(timestamp = c("2013-07-15 22:12:43 EST", 
"2013-05-04 21:30:06 EST", 
"2017-01-02 02:28:02 EST", 
"2013-02-28 08:06:09 EST", 
"2011-11-10 13:57:16 EST", 
"2015-11-12 21:05:37 EST", 
"2011-10-31 13:02:21 EST", 
"2015-01-18 12:22:45 EST", 
"2013-02-04 11:57:41 EST", 
"2011-10-16 21:54:27 EST", 
"2013-06-19 23:11:45 EST", 
"2015-08-16 19:26:29 EST", 
"2016-11-09 21:48:20 EST", 
"2011-06-13 13:30:19 EST", 
"2012-05-08 02:50:42 EST", 
"2014-10-15 23:27:28 EST", 
"2012-03-11 00:56:05 EST", 
"2014-07-16 17:32:34 EST", 
"2011-08-08 19:01:39 EST", 
"2014-08-31 13:41:49 EST", 
"2017-03-09 23:23:45 EST", 
"2013-02-16 13:27:49 EST", 
"2012-08-22 23:58:33 EST", 
"2012-04-20 11:06:32 EST", 
"2016-01-22 20:50:30 EST")) 

#Convert data to date 
df$timestamp <- as.Date(df$timestamp) 

#Count by year and month 
new <- data.frame(table(format(df$timestamp, "%Y-%m"))) 

#Append a day 
new$Var1 <- paste0(new$Var1, "-1") 

#Turn back into date 
new$Var1 <- as.Date(new$Var1, format = "%Y-%m-%d") 

#Plot using scale_x_date with 1 month breaks 
g <- ggplot(data = new , aes(x = Var1, y = Freq)) + 
    geom_bar(stat="identity") + 
    scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("1 month")) + 
    theme_bw() + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 
print(g) 
ggsave("g.png") 

Final Plot

0

,如果你想将你的数据转换成12个箱,每个日历月无论这不是很清楚,我有多少年的系列跨度,或者如果你要总结你系列到每月一次的频率。我将假设后者。所以:

# make some toy data representing an irregular time series, i.e., you have observations 
# for some days but not others 
set.seed(1) 
dates <- sample(seq(from = as.Date("2015-01-01"), to = as.Date("2016-12-31"), by = "day"), 300) 
values <- rnorm(300, 10, 2) 
df <- data.frame(date = dates, value = values) 

# load the packages we'll use. we need 'zoo' for its yearmon function.  
library(dplyr) 
library(ggplot2) 
library(zoo) 


# now... 
df %>% 
    # use 'as.yearmon' to create a variable identifying the unique year-month 
    # combination in which each observation falls 
    mutate(yearmon = as.yearmon(date)) %>% 
    # use that variable to group the data 
    group_by(yearmon) %>% 
    # count the number of observations in each of those year-month bins. if you 
    # want to summarise the data some other way, use 'summarise' here instead. 
    tally() %>% 
    # plot the resulting series with yearmon on the x-axis and using 'geom_col' 
    # instead of 'geom_hist' to preserve the temporal ordering and avoid 
    # having to specify stat = "identity" 
    ggplot(aes(x = yearmon, y = n)) + geom_col() 

结果:

enter image description here

如果你只想要12个箱不管多少年前的数据的时间跨度,您可以使用month功能从lubridate包来创建您的分组变量而不是as.yearmon