2017-08-03 423 views
2

我正在尝试使用一个列联表来计算python中的卡方值。这是一个例子。Python:Chi 2测试产生错误的结果(chi2_contingency)

+--------+------+------+ 
|  | Cat1 | Cat2 | 
+--------+------+------+ 
| Group1 | 80 | 120 | 
| Group2 | 420 | 380 | 
+--------+------+------+ 

预期值是:

+--------+------+------+ 
|  | Cat1 | Cat2 | 
+--------+------+------+ 
| Group1 | 100 | 100 | 
| Group2 | 400 | 400 | 
+--------+------+------+ 

如果我用手工计算卡方值,我得到10与Python,但是我得到9.506。 我使用下面的代码:

import numpy as np 
import pandas as pd 
from scipy.stats import chi2_contingency 
import scipy 

# Some fake data. 
n = 5 # Number of samples. 
d = 3 # Dimensionality. 
c = 2 # Number of categories. 
data = np.random.randint(c, size=(n, d)) 
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3']) 

# Contingency table. 
contingency = pd.crosstab(data['CAT1'], data['CAT2']) 

contingency.iloc[0][0]=80 
contingency.iloc[0][1]=120 
contingency.iloc[1][0]=420 
contingency.iloc[1][1]=380 

# Chi-square test of independence. 
chi, p, dof, expected = chi2_contingency(contingency) 

这是怪异的功能给了我正确的预期值,但卡方和p值是关闭。我在这里做错了什么?

谢谢

p.s.

我知道我在pandas中创建的初始表非常蹩脚,但我不是如何在熊猫中创建这些嵌套表的专家。

回答

3

从文档:

correction : bool, optional 
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity. 
The effect of the correction is to adjust each observed value by 0.5 towards 
the corresponding expected value. 

和自由度为1。是你设定的修正为False,你会得到10

chi2_contingency(contingency, correction=False) 
>>> (10.0, 0.001565402258002549, 1, array([[ 100., 100.], 
    [ 400., 400.]])) 
+1

谢谢你的快速帮助。将在6分钟内标记正确! – valenzio

相关问题