2016-12-15 97 views
1

对数下降曲线在我想为代表的对数下降曲线上运行梯度下降:梯度下降在Python

Y = Y0 - A * LN(B + X)。

我的这个例子Y0:800

我试图做到这一点使用的偏导数相对于A和B,但在这种明显减少了误差平方,它不收敛。我知道这不是矢量化的,我可能会完全采用错误的方法。我是否犯了一个简单的错误,或完全解决这个问题?

import numpy as np 

# constants my gradient descent model should find: 
a = 4 
b = 4 

# function to fit on! 
def function(x, a, b): 
    y0 = 800 
    return y0 - a * np.log(b + x) 

# Generates data 
def gen_data(numpoints): 
    a = 4 
    b = 4 
    x = np.array(range(0, numpoints)) 
    y = function(x, a, b) 
    return x, y 
x, y = gen_data(600) 

def grad_model(x, y, iterations): 
    converged = False 

    # length of dataset 
    m = len(x) 

    # guess a , b 
    theta = [0.1, 0.1] 
    alpha = 0.001 

    # initial error 
    e = np.sum((np.square(function(x, theta[0], theta[1])) - y)) 

    for iteration in range(iterations): 
     hypothesis = function(x, theta[0], theta[1]) 
     loss = hypothesis - y 

     # compute partial deritaves to find slope to "fall" into 
     theta0_grad = (np.mean(np.sum(-np.log(x + y))))/(m) 
     theta1_grad = (np.mean((((np.log(theta[1] + x))/theta[0]) - (x*(np.log(theta[1] + x))/theta[0]))))/(2*m) 

     theta0 = theta[0] - (alpha * theta0_grad) 
     theta1 = theta[1] - (alpha * theta1_grad) 

     theta[1] = theta1 
     theta[0] = theta0 

     new_e = np.sum(np.square((function(x, theta[0], theta[1])) - y)) 
     if new_e > e: 
      print "AHHHH!" 
      print "Iteration: "+ str(iteration) 
      break 
     print theta 
    return theta[0], theta[1] 
+0

是的,每当我通过标准线性渐变下降并且不太清楚如何解决这个问题时,我遇到了麻烦。 –

+1

还没有真正读过代码,但是,它是什么意思,它不会收敛?错误是否越来越大,因此它是分歧的?或者它收敛太久了?假设你确实编码了衍生物,那可能就是你选择了错误的“alpha”,或者梯度的方向有符号翻转('+'而不是'-')。 –

+0

如果我的错误分歧,我在代码中放了一个休息时间。我相信我的theta [0](a)变量的偏导数是正确的,但不是我的theta [1](b)变量。它似乎正确收敛,但只有theta [0]。 –

回答

0

我在代码中发现了一些bug。行

e = np.sum((np.square(function(x, theta[0], theta[1])) - y)) 

不正确并且应当与

e = np.sum((np.square(function(x, theta[0], theta[1]) - y))) 

为new_e公式包含相同的错误来代替。

此外,梯度公式是错误的。您的损失函数为 $ L(a,b)= \ sum_ {i = 1}^N y_0 - a \ log(b + x_i)$, 因此您必须计算$ L $的偏导数$ a $和$ b $。 (LaTeX真的不能在stackoverflow上工作吗?)最后一点是梯度下降方法有一个步长限制,所以我们的步长不能太大。下面是一个工作更好的代码版本:

import numpy as np 
import matplotlib.pyplot as plt 

# constants my gradient descent model should find: 
a = 4.0 
b = 4.0 
y0 = 800.0 

# function to fit on! 
def function(x, a, b): 
    # y0 = 800 
    return y0 - a * np.log(b + x) 

# Generates data 
def gen_data(numpoints): 
    # a = 4 
    # b = 4 
    x = np.array(range(0, numpoints)) 
    y = function(x, a, b) 
    return x, y 
x, y = gen_data(600) 

def grad_model(x, y, iterations): 
    converged = False 

    # length of dataset 
    m = len(x) 

    # guess a , b 
    theta = [0.1, 0.1] 
    alpha = 0.00001 

    # initial error 
    # e = np.sum((np.square(function(x, theta[0], theta[1])) - y)) # This was a bug 
    e = np.sum((np.square(function(x, theta[0], theta[1]) - y))) 

    costs = np.zeros(iterations) 

    for iteration in range(iterations): 
     hypothesis = function(x, theta[0], theta[1]) 
     loss = hypothesis - y 

     # compute partial deritaves to find slope to "fall" into 
     # theta0_grad = (np.mean(np.sum(-np.log(x + y))))/(m) 
     # theta1_grad = (np.mean((((np.log(theta[1] + x))/theta[0]) - (x*(np.log(theta[1] + x))/theta[0]))))/(2*m) 
     theta0_grad = 2*np.sum((y0 - theta[0]*np.log(theta[1] + x) - y)*(-np.log(theta[1] + x))) 
     theta1_grad = 2*np.sum((y0 - theta[0]*np.log(theta[1] + x) - y)*(-theta[0]/(b + x))) 

     theta0 = theta[0] - (alpha * theta0_grad) 
     theta1 = theta[1] - (alpha * theta1_grad) 

     theta[1] = theta1 
     theta[0] = theta0 

     # new_e = np.sum(np.square((function(x, theta[0], theta[1])) - y)) # This was a bug 
     new_e = np.sum(np.square((function(x, theta[0], theta[1]) - y))) 
     costs[iteration] = new_e 
     if new_e > e: 
      print "AHHHH!" 
      print "Iteration: "+ str(iteration) 
      # break 
     print theta 
    return theta[0], theta[1], costs 

(theta0,theta1,costs) = grad_model(x,y,100000) 
plt.semilogy(costs) 
+0

谢谢!奇迹般有效!任何标准程序遵循如何找到正确的步长? –