2017-03-15 77 views
4

我正在使用Facebook发布的新软件包Prophet。它做时间序列预测,我想按组应用这个函数。使用先知包以R组中的数据框预测

向下滚动至R部分。

https://facebookincubator.github.io/prophet/docs/quick_start.html

这是我的尝试:

grouped_output = df %>% group_by(group) %>% 
    do(m = prophet(df[,c(1,3)])) %>% 
    do(future = make_future_dataframe(m, period = 7)) %>% 
    do(forecast = prophet:::predict.prophet(m, future)) 

grouped_output[[1]] 

然后我需要提取从我有麻烦做每个组的列表中的结果。

下面是我没有组原始数据框:

ds <- as.Date(c('2016-11-01','2016-11-02','2016-11-03','2016-11-04', 
        '2016-11-05','2016-11-06','2016-11-07','2016-11-08', 
        '2016-11-09','2016-11-10','2016-11-11','2016-11-12', 
        '2016-11-13','2016-11-14','2016-11-15','2016-11-16', 
        '2016-11-17','2016-11-18','2016-11-19','2016-11-20', 
        '2016-11-21','2016-11-22','2016-11-23','2016-11-24', 
        '2016-11-25','2016-11-26','2016-11-27','2016-11-28', 
        '2016-11-29','2016-11-30')) 
y <- c(15,17,18,19,20,54,67,23,12,34,12,78,34,12,3,45,67,89,12,111,123,112,14,566,345,123,567,56,87,90) 
y<-as.numeric(y) 
df <- data.frame(ds, y) 

df 

      ds y 
1 2016-11-01 15 
2 2016-11-02 17 
3 2016-11-03 18 
4 2016-11-04 19 
5 2016-11-05 20 
6 2016-11-06 54 
7 2016-11-07 67 
8 2016-11-08 23 
9 2016-11-09 12 
10 2016-11-10 34 
11 2016-11-11 12 
12 2016-11-12 78 
13 2016-11-13 34 
14 2016-11-14 12 
15 2016-11-15 3 
16 2016-11-16 45 
17 2016-11-17 67 
18 2016-11-18 89 
19 2016-11-19 12 
20 2016-11-20 111 
21 2016-11-21 123 
22 2016-11-22 112 
23 2016-11-23 14 
24 2016-11-24 566 
25 2016-11-25 345 
26 2016-11-26 123 
27 2016-11-27 567 
28 2016-11-28 56 
29 2016-11-29 87 
30 2016-11-30 90 

目前功能工作时,我就做一个组,如下所示:

#install.packages('prophet') 
library(prophet) 
m<-prophet(df) 
future <- make_future_dataframe(m, period = 7) 
forecast <- prophet:::predict.prophet(m, future) 

forecast$yhat 
[1] -2.649032 -29.762095 128.169781 59.573684 -11.623727 107.473617 -29.949730 -42.862455 -62.378408 104.797639 46.868610 
[12] -12.502864 119.282058 -4.914921 -4.402638 -10.643570 169.309505 123.321261 74.734746 215.856347 99.290218 105.508059 
[23] 102.882915 284.245984 237.401258 185.688202 321.466962 197.451536 194.280518 180.535663 349.304365 288.684031 222.337210 
[34] 342.968499 203.648851 185.377165 

我现在想改变这种做法,它将prophet:::predict函数应用于每个组。因此,新的数据帧由组看起来是这样的:

ds <- as.Date(c('2016-11-01','2016-11-02','2016-11-03','2016-11-04', 
      '2016-11-05','2016-11-06','2016-11-07','2016-11-08', 
      '2016-11-09','2016-11-10','2016-11-11','2016-11-12', 
      '2016-11-13','2016-11-14','2016-11-15','2016-11-16', 
      '2016-11-17','2016-11-18','2016-11-19','2016-11-20', 
      '2016-11-21','2016-11-22','2016-11-23','2016-11-24', 
      '2016-11-25','2016-11-26','2016-11-27','2016-11-28', 
      '2016-11-29','2016-11-30', 


      '2016-11-01','2016-11-02','2016-11-03','2016-11-04', 
      '2016-11-05','2016-11-06','2016-11-07','2016-11-08', 
      '2016-11-09','2016-11-10','2016-11-11','2016-11-12', 
      '2016-11-13','2016-11-14','2016-11-15','2016-11-16', 
      '2016-11-17','2016-11-18','2016-11-19','2016-11-20', 
      '2016-11-21','2016-11-22','2016-11-23','2016-11-24', 
      '2016-11-25','2016-11-26','2016-11-27','2016-11-28', 
      '2016-11-29','2016-11-30')) 
y <- c(15,17,18,19,20,54,67,23,12,34,12,78,34,12,3,45,67,89,12,111,123,112,14,566,345,123,567,56,87,90, 
    45,23,12,10,21,34,12,45,12,44,87,45,32,67,1,57,87,99,33,234,456,123,89,333,411,232,455,55,90,21) 
y<-as.numeric(y) 

group<-c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A", 
    "A","A","A","A","A","A","A","A","A","A","A","A","A","A","A", 
    "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B", 
    "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B") 
df <- data.frame(ds,group, y) 

df 

      ds group y 
1 2016-11-01  A 15 
2 2016-11-02  A 17 
3 2016-11-03  A 18 
4 2016-11-04  A 19 
5 2016-11-05  A 20 
6 2016-11-06  A 54 
7 2016-11-07  A 67 
8 2016-11-08  A 23 
9 2016-11-09  A 12 
10 2016-11-10  A 34 
11 2016-11-11  A 12 
12 2016-11-12  A 78 
13 2016-11-13  A 34 
14 2016-11-14  A 12 
15 2016-11-15  A 3 
16 2016-11-16  A 45 
17 2016-11-17  A 67 
18 2016-11-18  A 89 
19 2016-11-19  A 12 
20 2016-11-20  A 111 
21 2016-11-21  A 123 
22 2016-11-22  A 112 
23 2016-11-23  A 14 
24 2016-11-24  A 566 
25 2016-11-25  A 345 
26 2016-11-26  A 123 
27 2016-11-27  A 567 
28 2016-11-28  A 56 
29 2016-11-29  A 87 
30 2016-11-30  A 90 
31 2016-11-01  B 45 
32 2016-11-02  B 23 
33 2016-11-03  B 12 
34 2016-11-04  B 10 
35 2016-11-05  B 21 
36 2016-11-06  B 34 
37 2016-11-07  B 12 
38 2016-11-08  B 45 
39 2016-11-09  B 12 
40 2016-11-10  B 44 
41 2016-11-11  B 87 
42 2016-11-12  B 45 
43 2016-11-13  B 32 
44 2016-11-14  B 67 
45 2016-11-15  B 1 
46 2016-11-16  B 57 
47 2016-11-17  B 87 
48 2016-11-18  B 99 
49 2016-11-19  B 33 
50 2016-11-20  B 234 
51 2016-11-21  B 456 
52 2016-11-22  B 123 
53 2016-11-23  B 89 
54 2016-11-24  B 333 
55 2016-11-25  B 411 
56 2016-11-26  B 232 
57 2016-11-27  B 455 
58 2016-11-28  B 55 
59 2016-11-29  B 90 
60 2016-11-30  B 21 

如何我预测使用prophet包中,y帽子的组,而不是总?

回答

4

这是一个解决方案,使用tidyr::nest来逐组数据,使用purrr::map将模型拟合到这些组中,然后根据请求检索y-hat。 我把你的代码,但它并入mutate调用,将使用purrr::map计算新的列。

library(prophet) 
library(dplyr) 
library(purrr) 
library(tidyr) 

d1 <- df %>% 
    nest(-group) %>% 
    mutate(m = map(data, prophet)) %>% 
    mutate(future = map(m, make_future_dataframe, period = 7)) %>% 
    mutate(forecast = map2(m, future, predict)) 

这里是在这一点上输出:

d1 
# A tibble: 2 × 5 
    group    data   m    future 
    <fctr>   <list>  <list>    <list> 
1  A <tibble [30 × 2]> <S3: list> <data.frame [36 × 1]> 
2  B <tibble [30 × 2]> <S3: list> <data.frame [36 × 1]> 
# ... with 1 more variables: forecast <list> 

然后我用unnest()forecast列检索数据并选择所要求的y值的帽子。

d <- d1 %>% 
    unnest(forecast) %>% 
    select(ds, group, yhat) 

这里是为新预测值输出:

d %>% group_by(group) %>% 
    top_n(7, ds) 
Source: local data frame [14 x 3] 
Groups: group [2] 

      ds group  yhat 
     <date> <fctr>  <dbl> 
1 2016-11-30  A 180.53422 
2 2016-12-01  A 349.30277 
3 2016-12-02  A 288.68215 
4 2016-12-03  A 222.33501 
5 2016-12-04  A 342.96654 
6 2016-12-05  A 203.64625 
7 2016-12-06  A 185.37395 
8 2016-11-30  B 131.07827 
9 2016-12-01  B 222.83703 
10 2016-12-02  B 236.33555 
11 2016-12-03  B 145.41001 
12 2016-12-04  B 228.59687 
13 2016-12-05  B 162.49244 
14 2016-12-06  B 68.44477 
+0

我不知道我是否应该使用'地图(男,〜预测(.X,将来))'或'MAP2(男,未来,预测(.x,.y))?他们似乎在这里给出了相同的输出。 – FlorianGD

+0

我应该使用'map2',同样的结果是optained,因为我有一个变量在我的会话中称为未来 – FlorianGD

+0

这很好,谢谢。有一件事,虽然这并不适用于我,我不得不改变对于其他人看这是,预测不适合我,我用'先知::: predict.prophet'取代'预测' –

1

我一直在寻找对同一问题的解决方案。我想出了下面的代码,这比接受的答案简单一些。

library(tidyr) 
library(dplyr) 
library(prophet) 

data = df %>% 
     group_by(group) %>% 
     do(predict(prophet(.), make_future_dataframe(prophet(.), periods = 7))) %>% 
     select(ds, group, yhat) 

这里是预测值

data %>% group_by(group) %>% 
     top_n(7, ds) 

# A tibble: 14 x 3 
# Groups: group [2] 
      ds group  yhat 
     <date> <fctr> <dbl> 
1 2016-12-01  A 316.9709 
2 2016-12-02  A 258.2153 
3 2016-12-03  A 196.6835 
4 2016-12-04  A 346.2338 
5 2016-12-05  A 208.9083 
6 2016-12-06  A 216.5847 
7 2016-12-07  A 206.3642 
8 2016-12-01  B 230.0424 
9 2016-12-02  B 268.5359 
10 2016-12-03  B 190.2903 
11 2016-12-04  B 312.9019 
12 2016-12-05  B 266.5584 
13 2016-12-06  B 189.3556 
14 2016-12-07  B 168.9791