2017-10-17 67 views
0

好吧,我有两个问题 - 也许他们是相关的 - 用假人和的因素。我将使用一个与我的数据库非常相似的示例。我有20个专栏,有几个名字,比如说一个国家的总统(例如“乔治W”,“比尔C”等)。另外,我有25列策略(例如“str_1”,“str2”等)。它们都在同一个数据库中,例如“dat”,以及其他变量,如y和x。因素和奶嘴回归

例如

============================= 
y x presidents strategies 
============================ 
20 2 Bill.C  3_A 
10 1 George.W 2_B 
10 1 Tom_C  3_C 
3 2 Tom_C  2_D 
4 4 John.C  3_A 
4 3 Bill.C  2_A 

我想退步Y〜X +假人为总统+假人为策略总统和策略之间+相互作用。

我已经创建假人为20名总统的每一个和25个策略,但我不知道如何创建每位总统和各战略之间的相互作用(这是我的问题的第一部分)。假设我可以很容易地做到这一点,是否有任何其他方式来指定我的回归,而不必写一个接一个的20 * 25交互(我知道Stata对这个问题有一些命令)?

也许这些都是不同的问题,但我不知道。

在此先感谢。

+0

什么是这个数据库的行?如果你可以提供一个(小的)示例数据框,那将是有帮助的 –

+1

“是否有任何其他方式来指定我的回归,而不必一个一个地写20 * 25交互”是的。 'lm'自动将因子变量转换为相应的虚拟变量(将一个变量作为参考类别)。因此,写'lm(y〜x +总统+战略+总统:战略,数据= dat)'就足够了,你甚至可以写'lm(y〜x +总统*策略,数据= dat)',这就是相同的规格。 – useR

+0

您需要提供一个更大的数据集,因为OLS无法处理比观察值(包括虚拟变量和交互)更多变量的数据集。 – useR

回答

0

lmglm自动将因子变量转换为其相应的虚拟变量(作为参考类别留下一个变量)。因此,这足以做到以下几点:

mod1 = lm(y ~ x + presidents + strategies + presidents:strategies, data = df1) 
mod2 = lm(y ~ x + presidents*strategies, data = df1) 
mod3 = glm(y ~ x + presidents + strategies + presidents:strategies, data = df1) 
mod4 = glm(y ~ x + presidents*strategies, data = df1) 

summary(mod1) 
summary(mod2) 
summary(mod3) 
summary(mod4) 

结果:

> summary(mod1) 

Call: 
lm(formula = y ~ x + presidents + strategies + presidents:strategies, 
    data = df1) 

Residuals: 
    Min  1Q Median  3Q  Max 
-17.3690 -6.1273 -0.1699 6.4295 17.4156 

Coefficients: 
           Estimate Std. Error t value Pr(>|t|)  
(Intercept)      14.4782  3.0799 4.701 5.15e-06 *** 
x         -0.1692  0.2141 -0.790 0.431  
presidentsGeorge.W    11.1984  8.8283 1.268 0.206  
presidentsJohn.C     4.1281  4.2305 0.976 0.330  
presidentsTom_C     4.9604  3.6271 1.368 0.173  
strategies2_B      1.6203  3.5736 0.453 0.651  
strategies2_D      -1.7246  3.6550 -0.472 0.638  
strategies3_A      1.7663  3.2966 0.536 0.593  
strategies3_C      -0.5787  3.8440 -0.151 0.881  
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320  
presidentsJohn.C:strategies2_B -1.5192  5.8696 -0.259 0.796  
presidentsTom_C:strategies2_B  -0.8962  5.0202 -0.179 0.859  
presidentsGeorge.W:strategies2_D -7.5266  9.7414 -0.773 0.441  
presidentsJohn.C:strategies2_D  1.7179  6.4375 0.267 0.790  
presidentsTom_C:strategies2_D  -1.1020  5.0551 -0.218 0.828  
presidentsGeorge.W:strategies3_A -11.9783  9.3115 -1.286 0.200  
presidentsJohn.C:strategies3_A -2.8849  5.0866 -0.567 0.571  
presidentsTom_C:strategies3_A  -5.0305  4.4068 -1.142 0.255  
presidentsGeorge.W:strategies3_C -6.5116  9.7387 -0.669 0.505  
presidentsJohn.C:strategies3_C -4.3792  6.0389 -0.725 0.469  
presidentsTom_C:strategies3_C  -1.3257  5.3821 -0.246 0.806  
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.364 on 179 degrees of freedom 
Multiple R-squared: 0.064, Adjusted R-squared: -0.04058 
F-statistic: 0.612 on 20 and 179 DF, p-value: 0.9007 

> summary(mod2) 

Call: 
lm(formula = y ~ x + presidents * strategies, data = df1) 

Residuals: 
    Min  1Q Median  3Q  Max 
-17.3690 -6.1273 -0.1699 6.4295 17.4156 

Coefficients: 
           Estimate Std. Error t value Pr(>|t|)  
(Intercept)      14.4782  3.0799 4.701 5.15e-06 *** 
x         -0.1692  0.2141 -0.790 0.431  
presidentsGeorge.W    11.1984  8.8283 1.268 0.206  
presidentsJohn.C     4.1281  4.2305 0.976 0.330  
presidentsTom_C     4.9604  3.6271 1.368 0.173  
strategies2_B      1.6203  3.5736 0.453 0.651  
strategies2_D      -1.7246  3.6550 -0.472 0.638  
strategies3_A      1.7663  3.2966 0.536 0.593  
strategies3_C      -0.5787  3.8440 -0.151 0.881  
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320  
presidentsJohn.C:strategies2_B -1.5192  5.8696 -0.259 0.796  
presidentsTom_C:strategies2_B  -0.8962  5.0202 -0.179 0.859  
presidentsGeorge.W:strategies2_D -7.5266  9.7414 -0.773 0.441  
presidentsJohn.C:strategies2_D  1.7179  6.4375 0.267 0.790  
presidentsTom_C:strategies2_D  -1.1020  5.0551 -0.218 0.828  
presidentsGeorge.W:strategies3_A -11.9783  9.3115 -1.286 0.200  
presidentsJohn.C:strategies3_A -2.8849  5.0866 -0.567 0.571  
presidentsTom_C:strategies3_A  -5.0305  4.4068 -1.142 0.255  
presidentsGeorge.W:strategies3_C -6.5116  9.7387 -0.669 0.505  
presidentsJohn.C:strategies3_C -4.3792  6.0389 -0.725 0.469  
presidentsTom_C:strategies3_C  -1.3257  5.3821 -0.246 0.806  
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.364 on 179 degrees of freedom 
Multiple R-squared: 0.064, Adjusted R-squared: -0.04058 
F-statistic: 0.612 on 20 and 179 DF, p-value: 0.9007 

> summary(mod3) 

Call: 
glm(formula = y ~ x + presidents + strategies + presidents:strategies, 
    data = df1) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-17.3690 -6.1273 -0.1699 6.4295 17.4156 

Coefficients: 
           Estimate Std. Error t value Pr(>|t|)  
(Intercept)      14.4782  3.0799 4.701 5.15e-06 *** 
x         -0.1692  0.2141 -0.790 0.431  
presidentsGeorge.W    11.1984  8.8283 1.268 0.206  
presidentsJohn.C     4.1281  4.2305 0.976 0.330  
presidentsTom_C     4.9604  3.6271 1.368 0.173  
strategies2_B      1.6203  3.5736 0.453 0.651  
strategies2_D      -1.7246  3.6550 -0.472 0.638  
strategies3_A      1.7663  3.2966 0.536 0.593  
strategies3_C      -0.5787  3.8440 -0.151 0.881  
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320  
presidentsJohn.C:strategies2_B -1.5192  5.8696 -0.259 0.796  
presidentsTom_C:strategies2_B  -0.8962  5.0202 -0.179 0.859  
presidentsGeorge.W:strategies2_D -7.5266  9.7414 -0.773 0.441  
presidentsJohn.C:strategies2_D  1.7179  6.4375 0.267 0.790  
presidentsTom_C:strategies2_D  -1.1020  5.0551 -0.218 0.828  
presidentsGeorge.W:strategies3_A -11.9783  9.3115 -1.286 0.200  
presidentsJohn.C:strategies3_A -2.8849  5.0866 -0.567 0.571  
presidentsTom_C:strategies3_A  -5.0305  4.4068 -1.142 0.255  
presidentsGeorge.W:strategies3_C -6.5116  9.7387 -0.669 0.505  
presidentsJohn.C:strategies3_C -4.3792  6.0389 -0.725 0.469  
presidentsTom_C:strategies3_C  -1.3257  5.3821 -0.246 0.806  
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for gaussian family taken to be 69.96038) 

    Null deviance: 13379 on 199 degrees of freedom 
Residual deviance: 12523 on 179 degrees of freedom 
AIC: 1439 

Number of Fisher Scoring iterations: 2 

> summary(mod4) 

Call: 
glm(formula = y ~ x + presidents * strategies, data = df1) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-17.3690 -6.1273 -0.1699 6.4295 17.4156 

Coefficients: 
           Estimate Std. Error t value Pr(>|t|)  
(Intercept)      14.4782  3.0799 4.701 5.15e-06 *** 
x         -0.1692  0.2141 -0.790 0.431  
presidentsGeorge.W    11.1984  8.8283 1.268 0.206  
presidentsJohn.C     4.1281  4.2305 0.976 0.330  
presidentsTom_C     4.9604  3.6271 1.368 0.173  
strategies2_B      1.6203  3.5736 0.453 0.651  
strategies2_D      -1.7246  3.6550 -0.472 0.638  
strategies3_A      1.7663  3.2966 0.536 0.593  
strategies3_C      -0.5787  3.8440 -0.151 0.881  
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320  
presidentsJohn.C:strategies2_B -1.5192  5.8696 -0.259 0.796  
presidentsTom_C:strategies2_B  -0.8962  5.0202 -0.179 0.859  
presidentsGeorge.W:strategies2_D -7.5266  9.7414 -0.773 0.441  
presidentsJohn.C:strategies2_D  1.7179  6.4375 0.267 0.790  
presidentsTom_C:strategies2_D  -1.1020  5.0551 -0.218 0.828  
presidentsGeorge.W:strategies3_A -11.9783  9.3115 -1.286 0.200  
presidentsJohn.C:strategies3_A -2.8849  5.0866 -0.567 0.571  
presidentsTom_C:strategies3_A  -5.0305  4.4068 -1.142 0.255  
presidentsGeorge.W:strategies3_C -6.5116  9.7387 -0.669 0.505  
presidentsJohn.C:strategies3_C -4.3792  6.0389 -0.725 0.469  
presidentsTom_C:strategies3_C  -1.3257  5.3821 -0.246 0.806  
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for gaussian family taken to be 69.96038) 

    Null deviance: 13379 on 199 degrees of freedom 
Residual deviance: 12523 on 179 degrees of freedom 
AIC: 1439 

Number of Fisher Scoring iterations: 2 

正如你所看到的,估计是完全一样的。

数据:

df = read.table(text = "y x presidents strategies 
       20 2 Bill.C  3_A 
       10 1 George.W 2_B 
       10 1 Tom_C  3_C 
       3 2 Tom_C  2_D 
       4 4 John.C  3_A 
       4 3 Bill.C  2_A", header = TRUE) 

set.seed(123) 
df1 = data.frame(y = sample(1:30, 200, replace = TRUE), 
       x = sample(1:10, 200, replace = TRUE), 
       presidents = sample(df$presidents, 200, replace = TRUE), 
       strategies = sample(df$strategies, 200, replace = TRUE)) 
+0

非常感谢。 – RandomWalker

+0

@RandomWalker如果这个答案有帮助,可以考虑通过点击downvote按钮下方的灰色复选标记来接受它。 – useR

+0

当然。如果我没有错,我投了票。 :) – RandomWalker