2017-09-24 230 views
1

我有一个数据集,其中包含来自数千个人的数据,其中测量了最近9年每年测量的参数X.使用协变量的时间序列分析

Basicly它们处于数据帧DF

id,year,x,feature 
A,2016,376,female 
A,2015,391,female 
A,2014,376,female 
A,2013,373,female 
A,2012,347,female 
A,2011,330,female 
B,2016,398,male 
B,2015,391,male 
B,2014,410,male 
B,2013,393,male 
B,2012,408,male 
B,2011,288,male 
C,2016,2464,male 
C,2015,2465,male 
C,2014,2500,male 
C,2013,2215,male 
C,2012,2228,male 
C,2011,1839,male 

我想在这些时间序列估计不同的模型

像预测(X(t))= F(X( t-1),x(t-2),...,x(tn),feature,id(作为随机因子))

我可以看到如何使用ts进行自回归建模,个人模型的影子和我想要基于时间历史和特征进行全局预测(有其固有的问题)。

因为数据是高度自相关的,所以lm并不是一个好主意。任何好主意?

+0

您可以尝试“具有外生输入模型的自回归移动平均模型”(ARMAX)。请参阅或示例'dse'包:https://cran.r-project.org/web/packages/dse/dse.pdf –

+0

尽量查看文档,但我必须承认这对于像我这样的MD来说是深奥的。不知道如何把我的数据框放入dse –

回答

1

有很多可能的模型,但这里是一个AR1结构的混合效果模型,您可以尝试。

library(nlme) 

fm <- lme(x ~ year + feature, random = ~ year | id, DF, 
    correlation = corAR1(form = ~ year | id)) 
summary(fm) 

,这里是数据的一个情节:

library(ggplot2) 

ggplot(DF, aes(year, x, group = id, col = feature)) + geom_line() + geom_point() 

screenshot

注:我们假设此输入数据:

Lines <- " 
id,year,x,feature 
A,2016,376,female 
A,2015,391,female 
A,2014,376,female 
A,2013,373,female 
A,2012,347,female 
A,2011,330,female 
B,2016,398,male 
B,2015,391,male 
B,2014,410,male 
B,2013,393,male 
B,2012,408,male 
B,2011,288,male 
C,2016,2464,male 
C,2015,2465,male 
C,2014,2500,male 
C,2013,2215,male 
C,2012,2228,male 
C,2011,1839,male" 
library(zoo) 
DF <- read.csv(text = Lines, strip.white = TRUE) 
0

有关声明功能f()出现很多菜单CES。

然而,线性类中,可以使用载体广义线性模型 (经由vglm()),以适应广义线性模型与ARMA(或GARCH)的结构,结合 协变量。例如,假设(预设的)随机错误是正态分布的,则可以使用来自程序包VGAMextra的族函数ARff(),如下所示。

然而,第二个选项通过智能预测使用非参数版本,即VGAMs。 唯一的缺点是vglms/vgams不处理随机效应。

library(VGAM) 
library(VGAMextra) 
# Fitting a linear model to the mean of the normal distribution 
# allowing an AR(3) struture. Use the modelling function vglm() and 
# the family functions ARff() 
df.read <- DF # DF as given by G.G. 
fit.Lines <- vglm(x ~ feature , ARff(order = 3, 
             zero = c("Var", "ARcoeff")), 
       data = df.read, trace = TRUE) 
coef(fit.Lines, matrix = TRUE) 
summary(fit.Lines, HD = FALSE) 

with(df.read, plot(fitted.values(fit.Lines) ~ year, 
       ylim = c(0, 3000), 
pch = 19, col = as.factor(feature))) 


# Using VGAMs, here, the family function uninormal() is utilized. 
# 

df.read2 <- data.frame(embed(df.read$x, 4)) 
names(df.read2) <- c("x", "xLag1", "xLag2", "xLag3") 
df.read2 <- transform(df.read2, year = df.read$year[-c(1:3)], 
         feature = df.read$feature[-c(1:3)]) 
fit.Lines.vgams <- vgam(x ~ sm.bs(xLag1) + sm.bs(xLag2) + 
         sm.bs(xLag3) + feature + year, 
        uninormal, data = df.read2, trace = TRUE) 

with(df.read2, plot(fitted.values(fit.Lines.vgams) ~ year, 
       ylim = c(0, 3000), 
       pch = 19, col = as.factor(feature)))