合并plm
拟合值回原始数据集需要一些中间步骤 - plm
下降缺失数据的任何行,而据我所知,一个plm
对象不包含索引信息。数据的顺序是而不是保存(请参阅Millo Giovanni在this thread中的评论:“输入顺序并不总是保留”)。
在短的步骤:
- 从估计
plm
对象获取拟合值。它是一个单独的矢量,但条目被命名。这些名称对应于索引中的位置。
- 使用
index()
函数获取索引。它可以返回个人和时间索引。请注意,索引可能包含比拟合值更多的行,以防删除缺失数据的行。 (也可以直接从原始数据生成索引,但我没有看到数据的原始顺序在plm
返回时保留的承诺。)
- 合并到原始数据中,从索引中查找id和时间值。
示例代码如下。有点长,但我试图评论。代码没有优化,我的意图是明确列出步骤。此外,我正在使用data.table
s而不是data.frame
s。
library(data.table); library(plm)
### Generate dummy data. This way we know the "true" coefficients
set.seed(100)
n <- 500 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.table(CJ(id = c("a","b","c","d","e"), time = c(1:(n/5))))
DT[, x1 := rnorm(n)]
DT[, x2 := rnorm(n)]
DT[, y := x1 + 2 * x2 + rnorm(n)/10]
setkey(DT, id, time)
# # Make it an unbalanced panel & put in some NAs
DT <- DT[!(id == "a" & time == 4)]
DT[.("a", 3), x2 := as.numeric(NA)]
DT[.("d", 2), x2 := as.numeric(NA)]
str(DT)
### Run the model -- both individual and time effects; "within" model
summary(PLM <- plm(data = DT, id = c("id", "time"), formula = y ~ x1 + x2, model = "within", effect = "twoways", na.action = "na.omit"))
### Merge the fitted values back into the data.table DT
# Note that PLM$model$y is shorter than the data, i.e. the row(s) with NA have been dropped
cat("\nRows omitted (due to NA): ", nrow(DT) - length(PLM$model$y))
# Since the objects returned by plm() do not contain the index, need to generate it from the data
# The object returned by plm(), i.e. PLM$model$y, has names that point to the place in the index
# Note: The index can also be done as INDEX <- DT[, j = .(id, time)], but use the longer way with index() in case plm does not preserve the order
INDEX <- data.table(index(x = pdata.frame(x = DT, index = c("id", "time")), which = NULL)) # which = NULL extracts both the individual and time indexes
INDEX[, id := as.character(id)]
INDEX[, time := as.integer(time)] # it is returned as a factor, convert back to integer to match the variable type in DT
# Generate the fitted values as the difference between the y values and the residuals
if (all(names(PLM$residuals) == names(PLM$model$y))) { # this should not be needed, but just in case...
FIT <- data.table(
index = as.integer(names(PLM$model$y)), # this index corresponds to the position in the INDEX, from where we get the "id" and "time" below
fit.plm = as.numeric(PLM$model$y) - as.numeric(PLM$residuals)
)
}
FIT[, id := INDEX[index]$id]
FIT[, time := INDEX[index]$time]
# Now FIT has both the id and time variables, can match it back into the original dataset (i.e. we have the missing data accounted for)
DT <- merge(x = DT, y = FIT[, j = .(id, time, fit.plm)], by = c("id", "time"), all = TRUE) # Need all = TRUE, or some data from DT will be dropped!
你确定那些是“装”的价值观?看起来他们是残留者,仔细观察语法。此外,结果看起来不像拟合值...也许我想念你的答案和拟合的价值是不可能与plm? – Luna