如何从列中提取月份

我想从Textmining with R web教科书创建一个绘图，但使用我的数据。它主要搜索每年的顶级词汇并对它们进行图表（图5.4：http://tidytextmining.com/dtm.html）。我的数据比他们开始使用的数据要干净一点，但我对R是新手。我的数据有一个2016-01-01格式的日期列（日期类）。我只是自2016年有数据，所以我希望做同样的事情，但更精细，（按月或按天IE）如何从列中提取月份

library(tidyr) 

year_term_counts <- inaug_td %>% 
extract(document, "year", "(\\d+)", convert = TRUE) %>% 
complete(year, term, fill = list(count = 0)) %>% 
group_by(year) %>% 
mutate(year_total = sum(count)) 

year_term_counts %>% 
filter(term %in% c("god", "america", "foreign", "union", "constitution", 
"freedom")) %>% 
ggplot(aes(year, count/year_total)) + 
geom_point() + 
geom_smooth() + 
facet_wrap(~ term, scales = "free_y") + 
scale_y_continuous(labels = scales::percent_format()) + 
ylab("% frequency of word in inaugural address")

的想法是，我会选择我的具体的话从我的文字，看到他们如何在几个月内改变。

谢谢！

来源

2017-06-13 Alex

欢迎来到SO：你有没有尝试打破'year_term_counts'函数检查中间步骤？你是否按照你的期望建立了结果？这将有助于我们看到一些数据。 –

您应该考虑在'lubridate'包中使用'month'函数来创建一个包含月份的整个列。 – ccapizzano

我会查看月份功能，谢谢！ – Alex

如果您希望根据您已有的日期列查看较小的时间单位，我建议您从lubridate查看floor_date()或round_date()函数。我们书中链接的特定章节涉及如何处理文档术语矩阵，然后整理它等等。您是否已经为数据使用了整齐的文本格式？如果是这样，那么你可以做这样的事情：

date_counts <- tidy_text %>% 
    mutate(date = floor_date(Date, unit = "7 days")) %>% # use whatever time unit you want here 
    count(date, word) %>% 
    group_by(date) %>% 
    mutate(date_total = sum(n)) 

date_counts %>% 
    filter(word %in% c("PUT YOUR LIST OF WORDS HERE")) %>% 
    ggplot(aes(date, n/date_total)) + 
    geom_point() + 
    geom_smooth() + 
    facet_wrap(~ word, scales = "free_y")

来源

2017-06-14 04:02:51

谢谢，朱莉娅！我一直在阅读你的新书。我是R的新手，但它非常有帮助。 – Alex

如何从列中提取月份

回答

相关问题