2016-09-19 65 views
3

我正在研究回归问题,并尝试使过程更加自动化。对于每个x变量,我都有一个我想测试的X变换矩阵(每列代表x变量的变换)。所以我需要创建一个循环,从每个X矩阵中取一个向量,对y变量进行测试并存储每个变量的t值。使用变量转换的多元回归自动化

我为2个X变量做了工作,但需要您的帮助将其扩展到n个变量。代码如下。

testvars <- function(y,X1,X2) { 

    Tvals_X1 = data.frame(matrix(0, ncol = ncol(X2), nrow = ncol(X1))) 
    Tvals_X2 = data.frame(matrix(0, ncol = ncol(X2), nrow = ncol(X1))) 

    for (i in 1:ncol(X1)) { 
    for (j in 1:ncol(X2)) { 
     temp <- lm(y ~ X1[,i] + X2[,j]) 
     Tvals_X1[i,j] <- summary(temp)$coefficients[2,3] 
     Tvals_X2[i,j] <- summary(temp)$coefficients[3,3] 
    } 
    } 
} 

回答

1

这是我的方法;

# example datas 
set.seed(1); y <- matrix(runif(20), ncol=1) 
set.seed(2); x1 <- matrix(runif(60), ncol=3) 
set.seed(3); x2 <- matrix(runif(80), ncol=4) 
set.seed(4); x3 <- matrix(runif(40), ncol=2) 
set.seed(5); x4 <- matrix(runif(60), ncol=3) 
我由具有COL-数
col.v <- sapply(list(x1,x2,x3,x4), ncol)   # ncols of each data 
col.comb <- expand.grid(sapply(col.v, seq.int)) # its all combinations 
# > head(col.comb, n=4) 
# Var1 Var2 Var3 Var4 
# 1 1 1 1 1 
# 2 2 1 1 1 
# 3 3 1 1 1 
# 4 1 2 1 1 
# 5 2 2 1 1 
我t.value通过 申请(col.comb,1,...)
tval <- apply(col.comb, 1, function(a) { 
    temp <- lm(y ~ x1[,a[1]] + x2[,a[2]] + x3[,a[3]] + x4[,a[4]]) 
    summary(temp)$coefficients[2:5, 3] }) 

# > head(tval, n=2)    # tval is matrix 
#  x1[, a[1]] x2[, a[2]] x3[, a[3]] x4[, a[4]] 
# [1,] -0.05692452 -0.9047370 -0.3758997 1.968530 
# [2,] 0.03476527 -0.9260632 -0.3740936 1.965884 
我所有组合的矩阵将tval-matrix的每一列改为 array and combined each array纳入 列表
results <- list()   # results[[1]] is x1's array 
for(i in seq.int(length(col.v))) results[[i]] <- array(tval[,i], dim=col.v) 
# names(results) <- c("x1", "x2", "x3", "x4") # if you want 

results2 <- array(t(tval), dim=c(length(col.v), col.v)) # all.array.version 
## results[[1]] is the same as results2[1,,,,] # both is x1's array 
    # dimnames(results2)[[1]] <- list("x1", "x2", "x3", "x4") # if you need 
检查
c(results[[1]][2,3,2,3], results[[2]][2,3,2,3], results[[3]][2,3,2,3], results[[4]][2,3,2,3]) 
# [1] 0.54580342 -0.56418433 -0.02780492 -0.50140806 

c(results2[1,2,3,2,3], results2[2,2,3,2,3], results2[3,2,3,2,3], results2[4,2,3,2,3]) 
# [1] 0.54580342 -0.56418433 -0.02780492 -0.50140806 

summary(lm(y ~ x1[,2] + x2[,3] + x3[,2] + x4[,3]))$coefficients[2:5,3] 
# x1[, 2]  x2[, 3]  x3[, 2]  x4[, 3] 
# 0.54580342 -0.56418433 -0.02780492 -0.50140806 # no problem 
功能版本(N = 4);
testvars2 <- function(y, x1, x2, x3, x4){ 

    col.v <- sapply(list(x1,x2,x3,x4), ncol) 
    col.comb <- expand.grid(sapply(col.v, seq.int)) 

    tval <- t(apply(col.comb, 1, function(a) { 
    temp <- lm(y ~ x1[,a[1]] + x2[,a[2]] + x3[,a[3]] + x4[,a[4]]) 
    summary(temp)$coefficients[2:5, 3] })) 

    results <- list() 
    for(i in seq.int(length(col.v))) results[[i]] <- array(tval[,i], dim=col.v) 
    #results2 <- array(t(tval), dim=c(length(col.v), col.v)) 

    return(results) 
} 
+0

我的X矩阵有600多列,导致expand.grid出错。你有什么建议如何解决它? –

+0

@SevaGumeniuk;如果你想把结果作为'array',它的'dim'变成'c(ncol(X1),ncol(X2),...,ncol(Xn))'。请尝试'test_array < - array(1,dim = c(ncol(X1),ncol(X2),...,ncol(Xn)))''。如果R返回与错误相关的大小,则不可能将结果'数组'。 – cuttlefish44

0

既然这是StackOverflow而不是CrossValidated,那么我将跳过有关这种变量选择方法问题的警告。买者自负。

计算上,反复调用lmglm会使R做相当多的簿记工作;相反,我会建议add1drop1函数。下面是示例中的示例输出,它会将每个双向交互添加到模型中。在你的情况中,由于每个预测变量使用1个自由度,所以F stat是t-stat平方。

> lm1 <- lm(Fertility ~ ., data = swiss) 
>  add1(lm1, ~ I(Education^2) + .^2, test='F') 
Single term additions 

Model: 
Fertility ~ Agriculture + Examination + Education + Catholic + 
    Infant.Mortality 
          Df Sum of Sq  RSS  AIC F value Pr(>F) 
<none>          2105.0429 190.69135     
I(Education^2)    1 11.818686 2093.2242 192.42672 0.22585 0.63721 
Agriculture:Examination  1 10.667353 2094.3756 192.45257 0.20373 0.65416 
Agriculture:Education   1 1.826563 2103.2164 192.65055 0.03474 0.85309 
Agriculture:Catholic   1 75.047836 2029.9951 190.98513 1.47878 0.23109 
Agriculture:Infant.Mortality 1 4.438027 2100.6049 192.59215 0.08451 0.77278 
Examination:Education   1 48.693777 2056.3492 191.59137 0.94719 0.33628 
Examination:Catholic   1 40.757983 2064.2850 191.77240 0.78977 0.37948 
Examination:Infant.Mortality 1 65.856710 2039.1862 191.19745 1.29182 0.26248 
Education:Catholic   1 278.189298 1826.8536 186.02953 6.09111 0.01796 * 
Education:Infant.Mortality 1 92.950398 2012.0925 190.56880 1.84784 0.18165 
Catholic:Infant.Mortality  1 2.358769 2102.6842 192.63865 0.04487 0.83332 
--- 
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1