物流梯度下降不汇聚在Python fmin_tnc

我一直在遵循一个教程来实现逻辑梯度下降在Python中。
这里是链接： http://www.johnwittenauer.net/machine-learning-exercises-in-python-part-3/

他对这个especific锻炼这里IPython的笔记本github上：
https://github.com/jdwittenauer/ipython-notebooks/blob/master/notebooks/ml/ML-Exercise2.ipynb

这里是我的这个问题代码：
物流梯度下降不汇聚在Python fmin_tnc

import pandas as pd 
import matplotlib.pylab as plt 
import numpy as np 
import scipy.optimize as opt 


def sigmoid(Z): 
    '''Compute the sigmoid function ''' 
    return 1.0/(1.0 + np.exp(-1.0 * Z)) 

########################################### 


def compute_cost(theta,X,y, learningRate): 
    '''compute cost given ''' 

    theta = np.matrix(theta) 
    X = np.matrix(X) 
    y = np.matrix(y) 
    m = y.size 
    theta0 = np.zeros((1,X.shape[1])) 
    theta0[0,1:] = theta[0,1:]  

    reg = np.dot((learningRate/2*m),(theta0.T.dot(theta0))) 

    Z = X.dot(theta.T) 

    hypothesis = sigmoid(Z) 
    exp1 = (-y.T.dot(np.log(hypothesis))) 
    exp2 = ((1.0 - y).T.dot(np.log(1.0 - hypothesis)))  
    J = (exp1 - exp2).dot(1/m) 

    return J.sum() + reg.sum() 



def grad(theta,X,y,learningRate):  

    theta = theta.T   
    X = np.matrix(X) 
    y = np.matrix(y) 
    m = y.shape[0] 
    theta0 = np.zeros(X.shape[1])  
    theta0[1:] = theta[1:]  
    theta = np.matrix(theta)  
    theta0 = np.matrix(theta0) 

    reg = np.dot(learningRate/m, theta) 

    Z = X.dot(theta.T)  
    hypothesis = sigmoid(Z)  
    error = hypothesis - y   
    grad = np.dot((X.T.dot(error).flatten()),1/m) + reg 
    grad= grad.flatten() 
    grad   

## 
def predict(theta, X):  
    probability = sigmoid(X * theta.T) 
    return [1 if (x >= 0.5) else 0 for x in probability]

这里代码如何被调用：
DATA2 = pd.read_csv（ 'ex2data2.txt'，标题=无，名称= [ '测试1'， '试验2'， '接受']）

y = data2[data2.columns[-1]].as_matrix() 
m = len(y) 
y = y.reshape(m, 1) 
X = data2[data2.columns[:-1]] 
X = X.as_matrix() 
_lambda = 1 

from sklearn.preprocessing import PolynomialFeatures 

#Get all high order parameters 
feature_mapper = PolynomialFeatures(degree=6) 
X = feature_mapper.fit_transform(X) 

# convert to numpy arrays and initalize the parameter array theta 

theta = np.zeros(X.shape[1]) 

learningRate = 1 

compute_cost(theta, X, y, learningRate)   

result = opt.fmin_tnc(func=compute_cost,x0=theta,fprime=grad,args= (X,y,learningRate))

对于一个变量一切运行良好，但与更多的功能（练习2）它不工作。直到使用优化梯度下降函数（fmin_tnc）的所有内容都非常相似。
不知何故，即使他的代码不会收敛到预期值。他是出了什么应该是fmin_tnc的结果，他的博客例如

但是，如果你按照他的代码你一步一步得到以下结果：

那么，你可以看到它有点不同。但是我在他的代码中发现了一个不同的东西。他放弃了2列'测试1'和'测试2'并且只保留高阶参数。这感觉很奇怪，因为在Andrew Ng的解决方案中他不会丢掉任何一列的表格，但他使用28特征。这仅使用11个功能。我找到了其他代码，并且我希望我的cost_function和gradient函数能够工作。我相信他们陷入了一些当地的最低限度，他们并没有达成一致。
我最终尝试了所有28个功能，就像Andrew的dataFrame一样。可悲的是我有一个不同的结果，你可以看到如下：

正如你所看到的，我已经有了一个更高的精确度，但我的成本仍然比预期的高，这就是：0.52900
我意图不在于淡化博客的代码质量。我仍然按照他的其他教程的步骤，似乎是一个很好的来源。
下面是我的代码的链接，我正在使用fmin_tnc，就像他正在做的一样。我刚创建了一个更向量化的gradient_function。该文件的名称是Logistic回归Regularized.py