2011-11-29 95 views
6

我正在执行概率计算。我有很多非常小的数字,我想从1中减去所有这些数字,并且准确地做到这一点。我可以精确计算这些小数的对数。我的策略至今一直像这样(使用numpy的):如何计算1的对数减去python中给定小数的指数

由于日志数量较少,x的数组,计算:

y = numpy.logaddexp.reduce(x) 

现在我想计算像1-exp(y)甚至更​​好log(1-exp(y)) ,但我不知道如何做,而不会失去我的所有精度。

实际上,即使是logaddexp函数也会遇到精度问题。矢量x中的值范围可以从-2到-800,或者甚至更负。来自上面的向量y基本上会有一个数字类型的eps大约1e-16的整段数字。因此,举例来说,在精确计算的数据看起来是这样的:

In [358]: x 
Out[358]: 
[-5.2194676211172837, 
-3.9050377656308362, 
-3.1619783292449615, 
-2.71289594096134, 
-2.4488395891021639, 
-2.3129210706827568, 
-2.2709987626652346, 
-2.3007776073511259, 
-2.3868404149802434, 
-2.5180718876609163, 
-2.68619816583087, 
-2.8849022632856958, 
-3.1092603032627686, 
-3.3553673369747834, 
-3.6200806272462351, 
-3.9008385919463073, 
-4.1955300857178379, 
-4.5023981074719899, 
-4.8199676154248081, 
-5.1469905756384904, 
-5.4824035553480428, 
-5.8252945959126876, 
-6.174877049340779, 
-6.5304687083067563, 
-6.8914750074202473, 
-7.25737538919104, 
-7.6277121540338797, 
-8.0020812775389558, 
-8.3801247986220773, 
-8.7615244716292437, 
-9.1459964426584435, 
-9.5332867613176404, 
-9.9231675781398394, 
-10.315433907978701, 
-10.709900863130784, 
-11.106401278287066, 
-11.50478366390567, 
-11.904910436107656, 
-12.30665638039909, 
-12.709907313918777, 
-13.114558916892051, 
-13.52051570882999, 
-13.927690148982549, 
-14.336001843810081, 
-14.745376846921289, 
-15.155747039147968, 
-15.567049578271309, 
-15.979226409456359, 
-16.39222382873956, 
-16.805992092998878, 
-17.22048507074976, 
-17.63565992888303, 
-18.051476851117201, 
-18.467898784496384, 
-18.884891210740903, 
-19.302421939667397, 
-19.720460922243518, 
-20.138980081145718, 
-20.557953156947775, 
-20.977355568292495, 
-21.397164284594595, 
-21.817357709992422, 
-22.237915577412224, 
-22.658818851739369, 
-23.080049641202237, 
-23.501591116172762, 
-23.923427434676114, 
-24.345543673975158, 
-24.767925767665417, 
-25.190560447772668, 
-25.61343519140047, 
-26.036538171518259, 
-26.459858211524278, 
-26.883384743252066, 
-27.307107768123842, 
-27.731017821180984, 
-28.155105937748402, 
-28.579363622513654, 
-29.003782820820732, 
-29.428355891997484, 
-29.853075584553352, 
-30.27793501309668, 
-30.702927636836705, 
-31.128047239545907, 
-31.553287910869187, 
-31.978644028878307, 
-32.404110243774596, 
-32.82968146265631, 
-33.255352835270173, 
-33.681119740674262, 
-34.106977774747804, 
-34.532922738484046, 
-34.958950627012712, 
-35.385057619298891, 
-35.811240068471022, 
-36.237494492735493, 
-36.663817566835519, 
-37.090206114019054, 
-37.516657098479527, 
-37.943167618239784, 
-38.369734898447348, 
-38.796356285056333, 
-39.223029238868548, 
-39.64975132991276, 
-40.076520232137909, 
-40.5033337184027, 
-40.930189655741344, 
-41.357086000888444, 
-41.784020796047173, 
-42.210992164885965, 
-42.637998308748706, 
-43.065037503066776, 
-43.492108093959985, 
-43.919208495015312, 
-44.346337184233221, 
-44.773492701130749, 
-45.200673643993753, 
-45.627878667267964, 
-46.055106479082156, 
-46.482355838895614, 
-46.909625555262096, 
-47.336914483704675, 
-47.764221524695017, 
-48.191545621730768, 
-48.618885759506213, 
-49.04624096217151, 
-49.473610291673936, 
-49.900992846179292, 
-50.328387758566748, 
-50.755794194994508, 
-51.183211353532613, 
-51.610638462858901, 
-52.0380747810147, 
-52.46551959421754, 
-52.892972215728378, 
-53.320431984769073, 
-53.747898265489198, 
-54.175370445978274, 
-54.602847937323247, 
-55.030330172705362, 
-55.457816606538813, 
-55.885306713645889, 
-56.312799988467418, 
-56.740295944308855, 
-57.167794112617116, 
-57.59529404228897, 
-58.02279529900909, 
-58.450297464615232, 
-58.877800136490578, 
-59.305302926981085, 
-59.732805462838542, 
-60.160307384683506, 
-60.587808346493375, 
-61.015308015110463, 
-61.442806069768608, 
-61.87030220164138, 
-62.297796113406662, 
-62.725287518829532, 
-63.15277614236129, 
-63.580261718755196, 
-64.007743992695964, 
-64.435222718445743, 
-64.862697659501919, 
-65.290168588270035, 
-65.717635285748088, 
-66.14509754122389, 
-66.572555151982783, 
-67.000007923029216, 
-67.427455666815376, 
-67.854898202982099, 
-68.282335358110231, 
-68.709766965479957, 
-69.137192864839108, 
-69.564612902180784, 
-69.992026929530198, 
-70.419434804735829, 
-70.8468363912732, 
-71.274231558051156, 
-71.701620179229167, 
-72.129002134037705, 
-72.556377306608397, 
-72.983745585807242, 
-73.411106865077045, 
-73.838461042282461, 
-74.265808019561746, 
-74.693147703185559, 
-75.120480003416901, 
-75.547804834380145, 
-75.97512211393132, 
-76.402431763534764, 
-76.829733708143749, 
-77.257027876085431, 
-77.684314198948414, 
-78.111592611476681, 
-78.538863051464546, 
-78.966125459656723, 
-79.393379779652037, 
-79.820625957809625, 
-80.24786394315754, 
-80.675093687306912, 
-81.102315144366912] 

然后我试着计算指数的对数之和:

In [359]: np.logaddexp.accumulate(x) 
Out[359]: 
array([ -5.21946762e+00, -3.66710221e+00, -2.68983273e+00, 
     -2.00815067e+00, -1.51126604e+00, -1.14067818e+00, 
     -8.60829425e-01, -6.48188808e-01, -4.86276416e-01, 
     -3.63085873e-01, -2.69624488e-01, -1.99028599e-01, 
     -1.45996863e-01, -1.06408884e-01, -7.70565672e-02, 
     -5.54467248e-02, -3.96506186e-02, -2.81859503e-02, 
     -1.99225261e-02, -1.40061296e-02, -9.79701394e-03, 
     -6.82045164e-03, -4.72733966e-03, -3.26317960e-03, 
     -2.24396350e-03, -1.53767347e-03, -1.05026994e-03, 
     -7.15209142e-04, -4.85690052e-04, -3.28980607e-04, 
     -2.22305294e-04, -1.49890553e-04, -1.00858788e-04, 
     -6.77380054e-05, -4.54139175e-05, -3.03974537e-05, 
     -2.03154477e-05, -1.35581905e-05, -9.03659252e-06, 
     -6.01552344e-06, -3.99984336e-06, -2.65671945e-06, 
     -1.76283376e-06, -1.16860435e-06, -7.73997496e-07, 
     -5.12213574e-07, -3.38706792e-07, -2.23809375e-07, 
     -1.47785898e-07, -9.75226648e-08, -6.43149957e-08, 
     -4.23904687e-08, -2.79246430e-08, -1.83858489e-08, 
     -1.20995365e-08, -7.95892319e-09, -5.23300609e-09, 
     -3.43929670e-09, -2.25953475e-09, -1.48391255e-09, 
     -9.74194956e-10, -6.39351406e-10, -4.19466218e-10, 
     -2.75121795e-10, -1.80397409e-10, -1.18254918e-10, 
     -7.74993004e-11, -5.07775611e-11, -3.32619009e-11, 
     -2.17835737e-11, -1.42634249e-11, -9.33764336e-12, 
     -6.11190167e-12, -3.99989955e-12, -2.61737204e-12, 
     -1.71253165e-12, -1.12043465e-12, -7.33052079e-13, 
     -4.79645919e-13, -3.13905885e-13, -2.05519681e-13, 
     -1.34650094e-13, -8.83173582e-14, -5.80300378e-14, 
     -3.82338678e-14, -2.52963381e-14, -1.68421145e-14, 
     -1.13181549e-14, -7.70918073e-15, -5.35155125e-15, 
     -3.81152630e-15, -2.80565548e-15, -2.14872312e-15, 
     -1.71971577e-15, -1.43957518e-15, -1.25665732e-15, 
     -1.13722927e-15, -1.05925916e-15, -1.00835857e-15, 
     -9.75131524e-16, -9.53442707e-16, -9.39286186e-16, 
     -9.30046550e-16, -9.24016349e-16, -9.20080954e-16, 
     -9.17512772e-16, -9.15836886e-16, -9.14743318e-16, 
     -9.14029759e-16, -9.13564174e-16, -9.13260398e-16, 
     -9.13062204e-16, -9.12932898e-16, -9.12848539e-16, 
     -9.12793505e-16, -9.12757603e-16, -9.12734183e-16, 
     -9.12718905e-16, -9.12708939e-16, -9.12702438e-16, 
     -9.12698198e-16, -9.12695432e-16, -9.12693627e-16, 
     -9.12692451e-16, -9.12691683e-16, -9.12691183e-16, 
     -9.12690856e-16, -9.12690643e-16, -9.12690504e-16, 
     -9.12690414e-16, -9.12690355e-16, -9.12690316e-16, 
     -9.12690291e-16, -9.12690275e-16, -9.12690264e-16, 
     -9.12690257e-16, -9.12690252e-16, -9.12690249e-16, 
     -9.12690248e-16, -9.12690246e-16, -9.12690245e-16, 
     -9.12690245e-16, -9.12690245e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16]) 

最终导致:

In [360]: np.logaddexp.reduce(x) 
Out[360]: -9.1269024387687033e-16 

所以我的精确度已经消失了。任何想法如何解决这个问题?

+0

我只是好奇,你的号码是什么(如果它不是秘密)? – Binus

+0

@UriLaserson如果其中一个答案有帮助,请将其标记为已接受。 –

回答

7

在Python 2.7,我们增加了math.expm1()这个用例:

>>> from math import exp, expm1 
>>> exp(1e-5) - 1 # gives result accurate to 11 places 
1.0000050000069649e-05 
>>> expm1(1e-5) # result accurate to full precision 
1.0000050000166668e-05 

此外,还有math.fsum()对于没有精度损失的累计工序:

>>> sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 
0.9999999999999999 
>>> fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 
1.0 

最后,如果没有别的帮助,您可以使用支持超高精度算术的decimal module

>>> from decimal import * 
>>> getcontext().prec = 200 
>>> (1 - 1/Decimal(7000000)).ln() 
Decimal('-1.4285715306122546161332083855139723669559469615692284955124609122046580004888309867906750714454869716398919778588515625689415322789136206397998627088895481989036005482451668027002380442299229191323673E-7') 
+0

对不起,请检查编辑,一个问题是,我甚至到达指数时精度已经超过了。 –

0

完全像Raymond Hettinger说的,但是你当然需要乘以-1,因为你想要1 - exp而不是exp - 1

+0

对不起,请检查编辑,一个问题是,我甚至达到指数的时候精度已经超过了。 –

3

我建议用它们的exp()和log()替换它们Taylor series相应地在0和1附近。这样,你就不会因为使用大数字而失去精确性(我的名字叫1)。使用Lagrange remainder公式或只是成员的表达与一些预留来确定,因为当差异将超出您的精度。

更新:

的Python 2.7的math.expm1exp(x)-1)和math.log1plog(1+x))为你做这个,如果平台的C库的精度(通常为双)就足够了。 (如果不是,你将不得不求助于特殊的数学软件(x86的FPU可以以更高的精度计算))。

2

我不知道这是否是你想要

numpy.expm1(x[, out]) 
Calculate exp(x) - 1 for all elements in the array. 



>>> import numpy as np 
>>> np.expm1(x).sum() 
-200.0 
>>> (-np.expm1(x)).sum() 
200.0 
>>> from scipy import special 
>>> (-special.expm1(x)).sum() 
200.0 
>>> np.log((-special.expm1(x)).sum()) 
5.2983173665480363 

编辑什么:

对不起,我不知道,这只是雷蒙德的Hettinger的回答的numpy的版本。

(不是答案的数值问题)

我仍然不知道确切的问题是什么,但是,而不是在它扔十进制或mpmath,也许问题进行改写会有所帮助。例如,如果将泊松中的概率相加,最终总是会“接近”到1.但是对于一些问题,我们可以避免使用生存函数而不是cdf处理一些问题。

+0

不要忘记numpy.log1p – matt

+0

我玩了log1p这个问题一段时间。但是因为我不确定这个问题实际上要求什么,所以我放弃了。 – user333700

1

我用mpmath来解决类似的问题。这是一个很好的和有据可查的100%python库。如果速度对你来说是一个问题;考虑与gmpy一起使用mpmath。看到这个link这样做。

3

不知道很多关于蟒蛇,我做了我的大部分在Java中的工作,但在我看来,你会过得更好执行上的所有值对数和-EXP招同时第一宁可做两份两个使用numpy.logaddexp.accumulate

在google中快速搜索会显示python库中的候选人:scipy.misc.logsumexp

在那会不会是很难给自己设定任何情况下:

logsumexp(probs) := max(probs) + log(sum[i](exp(probes[i]-max(probs)))) 

所以像:

maxValue = -Inf; 
    for x in probs 
    if x > maxValue then maxValue = x 
    expSum = 0 
    for x in probs 
    expSum += exp(x - maxValue) 
    return log(expSum) 

返回一个值,说p,为的只是日志所有概率的总和在probs。请注意,如果有大规模的输入概率最大,较小的值之间的不同,小的会被忽略,但如果只是他们的贡献是与大量在大多数应用中应该是精细比较非常小。

您可以使用更复杂的策略来使这些小数值可以在有大量小数字的情况下进行计数,比如probe = 0.5 + 1E-7 + 1E-7 ...百万次,所以加起来为0.1 。你可以做的是把几个单独的总和分在一起,在这些总和中大致相同的比例值先被加在一起,然后再合并。或者你可以使用一些中间枢纽值,而不是最大的,但你必须确保最大的价值不是太大,因为那时EXP(probs [I] - 透视)会导致在这种情况下溢出。

一旦做到这一点,你仍然需要做计算日志(1-EXP(P))

对于我发现这个文档描述的方式,以避免尽可能多的精度损失尽可能使用逻辑函数你可以在大多数语言数学公共库中找到。

Maechler M, Accurately Computing log(1 − exp(− |a|)) Assessed by the Rmpfr package, 2012

的关键是使用的两种可能的方法中的一种根据输入值一个

定义:

log1mexp(a) := log(1-exp(a)) ### function that we seek to implement. 
log1p(a) := log(1+a) # function provided by common math libraries. 
exp1m(a) := exp(a) - 1 # function provided by common math libraries. 

还有就是要实现log1mexp使用log1p一个明显的方式:

log1mexp(a) := log1p(-exp(a)) 

随着exp1m你可以这样做:

log1mexp(a) := log(-expm1(a)) 

你应该再使用log1p a < log(.5)和expm1时a> = log(.5)

log1mexp(a) := (a < log(.5)) ? log1p(-exp(a)) : log(-expm1(a)). 

请参考外部链接了解更多信息。