2017-02-15 250 views
0

我试图通过coef打印VIF(方差膨胀因子)。但是,我似乎无法从statsmodels中找到任何文档显示如何?我有一个需要处理的n个变量的模型,并且所有变量的多重共线性值都无法删除具有最高共线性的值。VIF通过OLS中的coef回归结果Python

这看起来像一个答案

https://stats.stackexchange.com/questions/155028/how-to-systematically-remove-collinear-variables-in-python

,但我将如何运行对这个工作簿。

http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv

下面是代码的输出摘要,这也是我现在在哪儿。

import pandas as pd 
import matplotlib.pyplot as plt 
import statsmodels.formula.api as smf 

# read data into a DataFrame 
data = pd.read_csv('somepath', index_col=0) 
print(data.head()) 

#multiregression 
lm = smf.ols(formula='Sales ~ TV + Radio + Newspaper', data=data).fit() 
print(lm.summary()) 

          OLS Regression Results        
============================================================================== 
Dep. Variable:     Sales R-squared:      0.897 
Model:       OLS Adj. R-squared:     0.896 
Method:     Least Squares F-statistic:      570.3 
Date:    Wed, 15 Feb 2017 Prob (F-statistic):   1.58e-96 
Time:      13:28:29 Log-Likelihood:    -386.18 
No. Observations:     200 AIC:        780.4 
Df Residuals:      196 BIC:        793.6 
Df Model:       3           
Covariance Type:   nonrobust           
============================================================================== 
       coef std err   t  P>|t|  [95.0% Conf. Int.] 
------------------------------------------------------------------------------ 
Intercept  2.9389  0.312  9.422  0.000   2.324  3.554 
TV    0.0458  0.001  32.809  0.000   0.043  0.049 
Radio   0.1885  0.009  21.893  0.000   0.172  0.206 
Newspaper  -0.0010  0.006  -0.177  0.860  -0.013  0.011 
============================================================================== 
Omnibus:      60.414 Durbin-Watson:     2.084 
Prob(Omnibus):     0.000 Jarque-Bera (JB):    151.241 
Skew:       -1.327 Prob(JB):      1.44e-33 
Kurtosis:      6.332 Cond. No.       454. 
============================================================================== 

回答

0

要获得的VIF的列表:

from statsmodels.stats.outliers_influence import variance_inflation_factor 

variables = lm.model.exog 
vif = [variance_inflation_factor(variables, i) for i in range(variables.shape[1])] 
vif 

得到他们的意思是:

np.array(vif).mean()