2014-04-17 28 views
1

我正在使用plm使用固定效应回归模型。将plm拟合值合并到数据集中

的模型看起来像这样:

FE.model <-plm(fml, data = data.reg2, 
      index=c('Site.ID','date.hour'), # cross section ID and time series ID 
      model='within', #coefficients are fixed 
      effect='individual') 
summary(FE.model) 

“FML” 是我先前所定义的公式。我有许多独立变量,所以这使得它更有效率。

我想要做的是得到我的拟合值(我的yhats)并将它们加入到我的基础数据集中; data.reg2

我能够使用此代码来获得拟合值:

Fe.model.fitted <- FE.model$model[[1]] - FE.model$residuals 

然而,这仅仅只是给了我拟合值的一列向量 - 我没有其连接到的方式我基础数据集。

另外,我已经试过这样的事情:

Fe.model.fitted <- cbind(data.reg2, resid=resid(FE.model), fitted=fitted(FE.model)) 

不过,我得到这个错误与:

Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""pseries"" to a data.frame 

是否有任何其他方式来获得在我的基地我的拟合值数据集?或者可以有人解释我收到的错误,也许是一种解决方法?

我应该注意到,我不想根据我的测试版手动计算yhats。我对这个选项有太多的自变量,我定义的公式(fml)可能会改变,所以这个选项不会有效。

非常感谢!

回答

0

残差是模型与公式LHS上的值的偏差....您没有向我们显示。 'plm'包中有一个fitted.panelmodel函数,但它似乎预计将会有一个fitted值,其中plm函数默认不会返回,也没有记录这样做,也不是我看到的方式让它咳嗽起来。

library(plm) 
data("Produc", package = "plm") 
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
      data = Produc, index = c("state","year")) 
summary(zz) # the example on the plm page: 
> str(fitted(zz)) 
NULL 
> names(zz$model) 
[1] "log(gsp)" "log(pcap)" "log(pc)" "log(emp)" "unemp"  
> Produc[ , c("Yvar", "Fitted")] <- cbind(zz$model[ ,"log(gsp)", drop=FALSE], zz$residuals) 
> str(Produc) 
'data.frame': 816 obs. of 12 variables: 
$ state : Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ year : int 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 ... 
$ pcap : num 15033 15502 15972 16406 16763 ... 
$ hwy : num 7326 7526 7765 7908 8026 ... 
$ water : num 1656 1721 1765 1742 1735 ... 
$ util : num 6051 6255 6442 6756 7002 ... 
$ pc : num 35794 37300 38670 40084 42057 ... 
$ gsp : int 28418 29375 31303 33430 33749 33604 35764 37463 39964 40979 ... 
$ emp : num 1010 1022 1072 1136 1170 ... 
$ unemp : num 4.7 5.2 4.7 3.9 5.5 7.7 6.8 7.4 6.3 7.1 ... 
$ Yvar :Classes 'pseries', 'pseries', 'integer' atomic [1:816] 10.3 10.3 10.4 10.4 10.4 ... 
    .. ..- attr(*, "index")='data.frame': 816 obs. of 2 variables: 
    .. .. ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ... 
    .. .. ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 3 4 5 6 7 8 9 10 ... 
$ Fitted: num -0.04656 -0.03064 -0.01645 -0.00873 -0.02708 ... 
+0

你确定那些是“装”的价值观?看起来他们是残留者,仔细观察语法。此外,结果看起来不像拟合值...也许我想念你的答案和拟合的价值是不可能与plm? – Luna

3

合并plm拟合值回原始数据集需要一些中间步骤 - plm下降缺失数据的任何行,而据我所知,一个plm对象不包含索引信息。数据的顺序是而不是保存(请参阅Millo Giovanni在this thread中的评论:“输入顺序并不总是保留”)。

在短的步骤:

  1. 从估计plm对象获取拟合值。它是一个单独的矢量,但条目被命名。这些名称对应于索引中的位置。
  2. 使用index()函数获取索引。它可以返回个人和时间索引。请注意,索引可能包含比拟合值更多的行,以防删除缺失数据的行。 (也可以直接从原始数据生成索引,但我没有看到数据的原始顺序在plm返回时保留的承诺。)
  3. 合并到原始数据中,从索引中查找id和时间值。

示例代码如下。有点长,但我试图评论。代码没有优化,我的意图是明确列出步骤。此外,我正在使用data.table s而不是data.frame s。

library(data.table); library(plm) 

### Generate dummy data. This way we know the "true" coefficients 
set.seed(100) 
n <- 500 # Run with more data if you want to get closer to the "true" coefficients 
DT <- data.table(CJ(id = c("a","b","c","d","e"), time = c(1:(n/5)))) 
DT[, x1 := rnorm(n)] 
DT[, x2 := rnorm(n)] 
DT[, y := x1 + 2 * x2 + rnorm(n)/10] 

setkey(DT, id, time) 
# # Make it an unbalanced panel & put in some NAs 
DT <- DT[!(id == "a" & time == 4)] 
DT[.("a", 3), x2 := as.numeric(NA)] 
DT[.("d", 2), x2 := as.numeric(NA)] 

str(DT) 

### Run the model -- both individual and time effects; "within" model 
summary(PLM <- plm(data = DT, id = c("id", "time"), formula = y ~ x1 + x2, model = "within", effect = "twoways", na.action = "na.omit")) 

### Merge the fitted values back into the data.table DT 
# Note that PLM$model$y is shorter than the data, i.e. the row(s) with NA have been dropped 
cat("\nRows omitted (due to NA): ", nrow(DT) - length(PLM$model$y)) 

# Since the objects returned by plm() do not contain the index, need to generate it from the data 
# The object returned by plm(), i.e. PLM$model$y, has names that point to the place in the index 
# Note: The index can also be done as INDEX <- DT[, j = .(id, time)], but use the longer way with index() in case plm does not preserve the order 
INDEX <- data.table(index(x = pdata.frame(x = DT, index = c("id", "time")), which = NULL)) # which = NULL extracts both the individual and time indexes 
INDEX[, id := as.character(id)] 
INDEX[, time := as.integer(time)] # it is returned as a factor, convert back to integer to match the variable type in DT 

# Generate the fitted values as the difference between the y values and the residuals 
if (all(names(PLM$residuals) == names(PLM$model$y))) { # this should not be needed, but just in case... 
    FIT <- data.table(
     index = as.integer(names(PLM$model$y)), # this index corresponds to the position in the INDEX, from where we get the "id" and "time" below 
     fit.plm = as.numeric(PLM$model$y) - as.numeric(PLM$residuals) 
    ) 
} 

FIT[, id := INDEX[index]$id] 
FIT[, time := INDEX[index]$time] 
# Now FIT has both the id and time variables, can match it back into the original dataset (i.e. we have the missing data accounted for) 
DT <- merge(x = DT, y = FIT[, j = .(id, time, fit.plm)], by = c("id", "time"), all = TRUE) # Need all = TRUE, or some data from DT will be dropped! 
0

我有一个简化的方法。这里的主要问题是双重的:

1)pdata.frames按字母顺序按名称排序输入,然后按年排序。这可以通过在运行plm之前先排序数据帧来解决。

2)删除公式中包含的变量中包含NA的行。我通过创建第二公式包括我的ID和时间变量处理这个问题,然后使用model.frame以提取在回归中使用的数据(不包括的NA,但现在还包括ID和时间)

library(plm) 
set.seed(100) 
n <- 10 # Run with more data if you want to get closer to the "true" coefficients 
DT <- data.frame(id = c("a","c","b","d","e"), time = c(1:(n/5)),x1 = rnorm(n),x2= rnorm(n),x3=rnorm(n)) 
DT$Y = DT$x2 + 2 * DT$x3 + rnorm(n)/10 # make x1 a function of other variables 
DT$x3[3]=NA # add an NA to show this works with missing data 
DT 

# now can add drop.index = F, but note that DT is now sorted by order(id,time) 
pdata.frame(DT,index=c('id','time'),drop.index = F) 

# order DT to match pdata.frame that will be used for plm 
DT=DT[order(DT$id,DT$time),] 

# formulas 
formulas =Y~x1+x2+x3 
formulas_dataframe = Y~x1+x2+x3 +id+time # add id and time for model.frame 

# estimate 
random <- plm(formulas, data=DT, index=c("id", "time"), model="random",na.action = 'na.omit') 
summary(random) 

# merge prediction and and model.frame 
fitted = data.frame(fitted = random$model[[1]] - random$residuals) 
model_data = cbind(as.data.frame(as.matrix(random$model)),fitted) # this isn't really needed but shows that input and model.frame are same 
model_data = cbind(model_data,na.omit(model.frame(formulas_dataframe,DT))) 
model_data 
0

我写在使用plm估算第一差异或固定效应模型后,使用函数(predict.out.plm)进行样本预测。

该函数进一步将预测值添加到原始数据的索引。这是通过使用存储内的plmrownames做 - attributes(plmobject)$index和内model.matrix

了解更多详情rownames看到功能张贴在这里:

https://stackoverflow.com/a/44185441/2409896