2016-02-26 134 views
0

我有一个结果数据,Y和10个预测变量(X1-X10)。循环通过R中的变量

set.seed(1001) 
n <- 100 
Y < c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) 
X1 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.4,0.5)) 
X2 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.5,0.25,0.25)) 
X3 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.3,0.4,0.4)) 
X4 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3)) 
X5 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.2,0.7)) 
X6 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.8,0.1,0.1)) 
X7 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.1,0.8)) 
X8 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3)) 
X9 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3)) 
X10 <- c(0,2,2,2,2,2,2,2,0,2,0,2,2,0,0,0,0,0,2,0,0,2,2,0,0,2,2,2,0,2,0,2,0,2,1,2,1,1,1,1,1,1,1,1,1,1,1,0,1,2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0) 

datasim <- data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10) 

我的目标是拟合每个预测变量的逻辑斯蒂模型并计算偏差差异(d偏差)。后来引导dDeviance 1000次(R = 1000)。我尝试了以下函数,它一次适用于一个变量。你能否建议我如何增强代码,使其通过变量1到10循环,计算d偏差并稍后自举值。

glmfunction <- function(data,indices) 
{ 
glm.snp1 <- glm(Y~X1, family="binomial", data=data[indices,]) 
null <- glm.snp1$null.deviance 
residual <- glm.snp1$deviance 
dDeviance <-(null-residual) 
return(dDeviance) 
} 

result <- boot(datasim,glmfunction, R=1000) 

回答

3

有可能有很多方法来解决这个问题,但这里是我如何做到这一点。我首先创建的独立变量我想在我的模型使用的载体:

#vector of independent variables 
iv <- grep("X",colnames(datasim), value=T) 

然后我遍历他们拟合模型并提取dDeviance。这确保了我的引导函数不会返回一个值,而是一个长度矢量(独立变量的数量)。

glmfunction <- function(data,indices, iv){ 
    res <- sapply(iv, function(x){ 
    fit <- glm(formula=sprintf("Y~%s",x), family="binomial", data=data[indices,]) 
    #deviance 
    dDeviance <- with(fit, null.deviance - deviance) 
    return(dDeviance) 
    }) 
    res 
} 

我选择把iv引导功能的一个正式的说法,所以你必须指定它,不意外的范围界定,问题的运行,灵活性和易于调试。然后你可以运行你的引导程序:

result <- boot(datasim,glmfunction, iv = iv, R=10) 
+0

非常感谢@Heroka。它工作得很好。 – Shima

+0

不客气!我很高兴看到你已经完美地解决了你以前的问题(带有索引/奇怪的结果)。 – Heroka