2015-04-04 241 views
0

我正在使用scipy.optimize.minimize来最小化一个简单的对数似然函数。 Hessian矩阵似乎表现不佳。scipy优化最小化:hess_inv强烈依赖于初始猜测

import scipy.optimize as op 

def lnlike(theta, n, bhat, fhat, sigb, sigf): 
    S, b, f = theta 
    mu = f*S + b 
    scb2 = ((b-bhat)/sigb)**2 
    scf2 = ((f-fhat)/sigf)**2 
    return n*np.log(mu) - mu - 0.5*(scb2+scf2) 
nll = lambda *args: -lnlike(*args) 

myargs=(21.0, 20.0, 0.5, 6.0, 0.1) 

如果最初的猜测是最小的,那么迭代不会去任何地方。就参数值而言这很好,但它也不会触及Hessian(仍然是身份),所以我不能用它来估计不确定性。

x0 = [2.0, 20.0, 0.5] # initial guess is at the minimum 
result = op.minimize(nll, x0, args= myargs) 
print result 

    status: 0 
    success: True 
    njev: 1 
    nfev: 5 
hess_inv: array([[1, 0, 0], 
     [0, 1, 0], 
     [0, 0, 1]]) 
     fun: -42.934971192191881 
     x: array([ 2. , 20. , 0.5]) 
    message: 'Optimization terminated successfully.' 
     jac: array([ 0.00000000e+00, 0.00000000e+00, 9.53674316e-07]) 

如果我稍微改变了初始猜测,它似乎会返回一个合理的hess_inv。

x0 = [2.01, 20.0, 0.5] 
result = op.minimize(nll, x0, args= myargs) 
print result 
print np.sqrt(result.hess_inv[0,0]) 

status: 0 
    success: True 
    njev: 15 
    nfev: 75 
hess_inv: array([[ 2.16004477e+02, -7.60588367e+01, -2.94846112e-02], 
     [ -7.60588367e+01, 3.55748024e+01, 2.74064505e-03], 
     [ -2.94846112e-02, 2.74064505e-03, 9.98030944e-03]]) 
     fun: -42.934971191969964 
     x: array([ 1.99984604, 19.9999814 , 0.5000001 ]) 
    message: 'Optimization terminated successfully.' 
     jac: array([ -2.38418579e-06, -5.24520874e-06, 1.90734863e-06]) 
14.697090757 

但是,hess_inv对初始猜测非常敏感。

x0 = [2.02, 20.0, 0.5] 
result = op.minimize(nll, x0, args= myargs) 
print result 
print np.sqrt(result.hess_inv[0,0]) 

    status: 0 
    success: True 
    njev: 16 
    nfev: 80 
hess_inv: array([[ 1.82153214e+02, -6.03482772e+01, -2.97458789e-02], 
     [ -6.03482772e+01, 3.30771459e+01, -2.53811809e-03], 
     [ -2.97458789e-02, -2.53811809e-03, 9.99052952e-03]]) 
     fun: -42.934971192188634 
     x: array([ 1.9999702 , 20.00000354, 0.50000001]) 
    message: 'Optimization terminated successfully.' 
     jac: array([ -9.53674316e-07, -4.76837158e-07, -4.76837158e-07]) 
13.4964148462 

更改初始猜测略偏

x0 = [2.03, 20.0, 0.5] 
result = op.minimize(nll, x0, args= myargs) 
print result 
print np.sqrt(result.hess_inv[0,0]) 

    status: 0 
    success: True 
    njev: 14 
    nfev: 70 
hess_inv: array([[ 2.30479371e+02, -7.36087027e+01, -3.79639119e-02], 
     [ -7.36087027e+01, 3.55785937e+01, 3.54182478e-03], 
     [ -3.79639119e-02, 3.54182478e-03, 9.97664441e-03]]) 
     fun: -42.93497119204827 
     x: array([ 1.99975148, 20.00006366, 0.50000009]) 
    message: 'Optimization terminated successfully.' 
     jac: array([ -9.53674316e-07, -9.53674316e-07, 4.29153442e-06]) 
15.1815470484 

我错过了什么?这是一个错误还是一个功能?

回答

1

我理解优化器的方式,Hessian通过有限差分来近似。在你的情况下,它似乎不是最好的主意。如果您使用的是拟牛顿法,which from the documentation it appears you are

import sympy as sy 
import numpy as np 
import scipy.optimize as sopt 

from IPython.display import display # nice printing 

sy.init_printing() # LaTeX like printing for IPython 

def lnlike(theta, n, bhat, fhat, sigb, sigf): 
    S, b, f = theta 
    mu = f*S + b 
    scb2 = ((b-bhat)/sigb)**2 
    scf2 = ((f-fhat)/sigf)**2 
    return n*sy.log(mu) - mu - (scb2+scf2)/2 

# declare symbols: 
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True) 
theta = (th_S, th_b, th_f) 
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True) 
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True) 

# symbolic optimizaton function: 
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf) 
# Gradient: 
dlf = sy.Matrix([lf.diff(th) for th in theta]) 
# Hessian: 
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta]) 

print("Symbolic Hessian:") 
display(Hlf) 

# Make numpy functions: 
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters 
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs) 
lf_lam = sy.lambdify(theta, lf_a, modules="numpy") 
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy") 
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy") 

nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function 
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient 
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian 

x0 = [2.02, 20.0, 0.5] 
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG') 
print(rs) 
print("Hessian:") 
print(nHlf(rs.x)) 
+0

谢谢。这很好。但是,看起来这只适用于分析差异可用的简单情况。有没有一种方法可以产生更精确的黑体数字? – physcheng 2015-04-04 23:25:43

+1

只要你可以表达你的可能性作为一个公式,我相当肯定,Sympy可以计算一个Hessian,因为它存在。如果你想要强大的数字方法,你需要有一些关于你的函数的平滑性的知识来选择一个合适的区分器。另一种标准技术被称为“自动区分”(https://en.wikipedia.org/wiki/Automatic_differentiation)。对于像Python这样的解释型语言,我没有看到Sympy上的自动区分技术(当使用sy.symplify时)的优势。 – Dietrich 2015-04-05 00:24:13

1

拟牛顿方法在黑森州逆建立一个猜测也许,利用Sympy(在IPython中)会产生更多有用的结果通过将一系列低级更新应用于完全天真的猜测(通常是身份的倍数)。所使用的低级更新在某种意义上是使得给定等式成立的“最小改变”更新,而“最小改变”的含义随所选的准牛顿方法而变化。如果从最小值开始,或者非常接近最小值,那么优化程序将很快计算出来,并且它不会在与Hessian逆近似时累积很多信息。

+0

谢谢你的解释。有没有一种方法可以产生更准确的Hessian矩阵? – physcheng 2015-04-04 23:27:02

+0

@physcheng:你通常可以象征性地计算事物。请记住,存储Hessian或Hessian逆将在变量数量上占用二次空间;如果你正在训练具有大量参数的模型,这很快就变得毫无用处。如果不是,手工计算Hessian一般不会太困难。 – tmyklebu 2015-04-05 00:10:45