2010-10-02 394 views
6

我想在R中运行一个两级probit最小二乘法回归法。有谁知道如何做到这一点?那里有包吗?我知道使用Stata可以做到这一点,所以我想可以用R来做。R中的两级最小二乘法

回答

8

当你说'两阶段概率最小二乘'时,你可能想要更具体一些。既然你提到了一个实现这个的Stata程序,我猜你正在谈论CDSIMEQ包,这个包实现了Heckit模型(又称通用Tobit,又名Tobit类型II等)的Amemiya(1978)程序。正如Grant所说,systemfit会为你做一个Tobit,但不会有两个方程式。 MicEcon软件包确实有一个Heckit(但是这个软件包已经分裂了很多次,我不知道它现在在哪里)。

如果你想CDSIMEQ做什么,它可以很容易地在河中实现我写的复制CDSIMEQ功能:

tspls <- function(formula1, formula2, data) { 
    # The Continous model 
    mf1 <- model.frame(formula1, data) 
    y1 <- model.response(mf1) 
    x1 <- model.matrix(attr(mf1, "terms"), mf1) 

    # The dicontionous model 
    mf2 <- model.frame(formula2, data) 
    y2 <- model.response(mf2) 
    x2 <- model.matrix(attr(mf2, "terms"), mf2) 

    # The matrix of all the exogenous variables 
    X <- cbind(x1, x2) 
    X <- X[, unique(colnames(X))] 

    J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1)) 
    J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2)) 
    for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1 
    for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1 

    # Step 1: 
    cat("\n\tNOW THE FIRST STAGE REGRESSION") 
    m1 <- lm(y1 ~ X - 1) 
    m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit")) 
    print(summary(m1)) 
    print(summary(m2)) 

    yhat1 <- m1$fitted.values 
    yhat2 <- X %*% coef(m2) 

    PI1 <- m1$coefficients 
    PI2 <- m2$coefficients 
    V0 <- vcov(m2) 
    sigma1sq <- sum(m1$residuals^2)/m1$df.residual 
    sigma12 <- 1/length(y2) * sum(y2 * m1$residuals/dnorm(yhat2)) 

    # Step 2: 
    cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS") 

    m1 <- lm(y1 ~ yhat2 + x1 - 1) 
    m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 
    sm1 <- summary(m1) 
    sm2 <- summary(m2) 
    print(sm1) 
    print(sm2) 

    # Step 3: 
    cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n") 
    gamma1 <- m1$coefficients[1] 
    gamma2 <- m2$coefficients[1] 

    cc <- sigma1sq - 2 * gamma1 * sigma12 
    dd <- gamma2^2 * sigma1sq - 2 * gamma2 * sigma12 
    H <- cbind(PI2, J1) 
    G <- cbind(PI1, J2) 

    XX <- crossprod(X)       # X'X 
    HXXH <- solve(t(H) %*% XX %*% H)   # (H'X'XH)^(-1) 
    HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH 
    Valpha1 <- cc * HXXH + gamma1^2 * HXXH %*% HXXVXXH %*% HXXH 

    GV <- t(G) %*% solve(V0) # G'V0^(-1) 
    GVG <- solve(GV %*% G)  # (G'V0^(-1)G)^(-1) 
    Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG 

    ans1 <- coef(sm1) 
    ans2 <- coef(sm2) 

    ans1[,2] <- sqrt(diag(Valpha1)) 
    ans2[,2] <- sqrt(diag(Valpha2)) 
    ans1[,3] <- ans1[,1]/ans1[,2] 
    ans2[,3] <- ans2[,1]/ans2[,2] 
    ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE) 
    ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE) 

    cat("Continuous:\n") 
    print(ans1) 
    cat("Dichotomous:\n") 
    print(ans2) 
} 

为了便于比较,我们可以从CDSIMEQ在笔者复制其样本article about the package

> library(foreign) 
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta") 
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4, 
+  dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7, 
+  data = cdsimeq) 

     NOW THE FIRST STAGE REGRESSION 
Call: 
lm(formula = y1 ~ X - 1) 

Residuals: 
     Min  1Q Median  3Q  Max 
-1.885921 -0.438579 -0.006262 0.432156 2.133738 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
X(Intercept) 0.010752 0.020620 0.521 0.602187  
Xexog3  0.158469 0.021862 7.249 8.46e-13 *** 
Xexog2  -0.009669 0.021666 -0.446 0.655488  
Xexog1  0.159955 0.021260 7.524 1.19e-13 *** 
Xexog4  0.316575 0.022456 14.097 < 2e-16 *** 
Xexog5  0.497207 0.021356 23.282 < 2e-16 *** 
Xexog6  -0.078017 0.021755 -3.586 0.000352 *** 
Xexog7  0.161177 0.022103 7.292 6.23e-13 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6488 on 992 degrees of freedom 
Multiple R-squared: 0.5972,  Adjusted R-squared: 0.594 
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ X - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49531 -0.59244 0.01983 0.59708 2.41810 

Coefficients: 
      Estimate Std. Error z value Pr(>|z|)  
X(Intercept) 0.08352 0.05280 1.582 0.113692  
Xexog3  0.21345 0.05678 3.759 0.000170 *** 
Xexog2  0.21131 0.05471 3.862 0.000112 *** 
Xexog1  0.45591 0.06023 7.570 3.75e-14 *** 
Xexog4  0.39031 0.06173 6.322 2.57e-10 *** 
Xexog5  0.75955 0.06427 11.818 < 2e-16 *** 
Xexog6  0.85461 0.06831 12.510 < 2e-16 *** 
Xexog7  -0.16691 0.05653 -2.953 0.003152 ** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.14 on 992 degrees of freedom 
AIC: 770.14 

Number of Fisher Scoring iterations: 6 


     NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS 
Call: 
lm(formula = y1 ~ yhat2 + x1 - 1) 

Residuals: 
    Min  1Q Median  3Q  Max 
-2.32152 -0.53160 0.04886 0.53502 2.44818 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
yhat2   0.257592 0.021451 12.009 <2e-16 *** 
x1(Intercept) 0.012185 0.024809 0.491 0.623  
x1exog3  0.042520 0.026735 1.590 0.112  
x1exog2  0.011854 0.026723 0.444 0.657  
x1exog1  0.007773 0.028217 0.275 0.783  
x1exog4  0.318636 0.028311 11.255 <2e-16 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7803 on 994 degrees of freedom 
Multiple R-squared: 0.4163,  Adjusted R-squared: 0.4128 
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49610 -0.58595 0.01969 0.59857 2.41281 

Coefficients: 
       Estimate Std. Error z value Pr(>|z|)  
yhat1   1.26287 0.16061 7.863 3.75e-15 *** 
x2(Intercept) 0.07080 0.05276 1.342 0.179654  
x2exog1  0.25093 0.06466 3.880 0.000104 *** 
x2exog2  0.22604 0.05389 4.194 2.74e-05 *** 
x2exog5  0.12912 0.09510 1.358 0.174544  
x2exog6  0.95609 0.07172 13.331 < 2e-16 *** 
x2exog7  -0.37128 0.06759 -5.493 3.94e-08 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.21 on 993 degrees of freedom 
AIC: 768.21 

Number of Fisher Scoring iterations: 6 

     NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS 

Continuous: 
       Estimate Std. Error t value Pr(>|t|) 
yhat2   0.25759209 0.1043073 2.46955009 0.01369540 
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445 
x1exog3  0.04252006 0.1291588 0.32920764 0.74206810 
x1exog2  0.01185438 0.1290754 0.09184073 0.92684309 
x1exog1  0.00777347 0.1363643 0.05700519 0.95455252 
x1exog4  0.31863627 0.1367881 2.32941597 0.02003661 
Dichotomous: 
       Estimate Std. Error z value  Pr(>|z|) 
yhat1   1.26286574 0.7395166 1.7076909 0.0876937093 
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867 
x2exog1  0.25092561 0.3126763 0.8025092 0.4222584495 
x2exog2  0.22603717 0.2739307 0.8251618 0.4092797527 
x2exog5  0.12911922 0.4822986 0.2677163 0.7889176766 
x2exog6  0.95609385 0.2823662 3.3860070 0.0007091758 
x2exog7  -0.37128221 0.3265478 -1.1369920 0.2555416141 
2

在R中有几个包可用来做两个状态最小二乘。这里有几个

  1. SEM:Two-Stage Least Squares
  2. 泽里格柯:链接删除,不再功能(11年7月28日)

让我知道如果这些满足您的目的。

+0

谢谢,我会试试看。这似乎符合我的需求。 – 2010-10-02 01:59:46

2

systemfit也会伎俩。