2012-12-26 36 views
1

在阅读recent blog post关于泊松分布的应用程序后,我尝试使用Python的'scipy.stats'模块以及Excel/LibreOffice'POISSON'和'CHITEST'功能。Python SciPy chisquare测试从Excel和LibreOffice返回不同​​的p值

的文章中显示的预期值,我只是用:

import scipy.stats 
for i in range(8): 
    print(scipy.stats.poisson.pmf(i, 2)*31) 

这再现了博客中所示的表格 - 我也重新从内LibreOffice中,使用具有第一列A单元格A1,A2,...,A8中的值0至7以及在列B的前8行中重复的简单公式'= POISSON(A1,2,0)* 31'。

迄今为止好 - 现在对于卡方p检验值:

在LibreOffice下,我只是写下了在单元格C1-C8中观察到的值,并且使用'= CHITEST(C1:C8,B1:B8)'重现该文章报道的0.18的p值。然而,在scipy.stats,我似乎无法重现此值:

import numpy as np 
import scipy.stats 

obs = [4, 10, 7, 5, 4, 0, 0, 1] 
exp = [scipy.stats.poisson.pmf(i, 2)*31 for i in range(8)] 

# we only estimated one variable (the rate of 2 killings per year via 62/31) 
# so dof will be N-1-estimates 
estimates = 1 
print(scipy.stats.chisquare(np.array(obs), np.array(exp), ddof=len(obs)-1-estimates)) 
# (10.112318133864241, 0.0014728159441179519) 
# the p-test value reported is 0.00147, not 0.18... 
# 
# Maybe I need to aggregate categories with observations less than 5 
# (as suggested in many textbooks of statistics for chi-squared tests)? 
observedAggregateLessThan5 = [14, 7, 5, 5] 
expectedAggregateLessThan5 = [exp[0]+exp[1], exp[2], exp[3], sum(exp[4:])] 
print(scipy.stats.chisquare(np.array(observedAggregateLessThan5), np.array(expectedAggregateLessThan5), ddof=len(observedAggregateLessThan5)-1-estimates)) 
# (0.53561749342466913, 0.46425467595930309) 
# Again the p-test value computed is not 0.18, it is 0.46... 

我做错了什么?

回答

3

您没有正确使用ddof参数。 ddof更改以使默认的自由度。默认值比长度小1。所以你根本不需要指定ddof

In [21]: obs 
Out[21]: [4, 10, 7, 5, 4, 0, 0, 1] 

In [22]: exp 
Out[22]: 
[4.1953937803349941, 
8.3907875606699882, 
8.3907875606699882, 
5.5938583737799901, 
2.796929186889995, 
1.1187716747559984, 
0.37292389158533251, 
0.10654968331009501] 

In [23]: chisquare(obs, f_exp=array(exp)) 
Out[23]: (10.112318133864241, 0.1822973566091409) 
相关问题