带有多个变量的矩阵的梯度下降

我是Matlab和机器学习的新手，我试图在不使用矩阵的情况下制作梯度下降函数。带有多个变量的矩阵的梯度下降

米是设定例如在我的训练数
Ñ是特征的各实施例

功能gradientDescentMulti需要5个参数的数目：

X mxn矩阵
ý米维向量
THETA：n维向量
阿尔法：实数
nb_iters：实数

我已经有使用矩阵乘法的解法

function theta = gradientDescentMulti(X, y, theta, alpha, num_iters) 
    for iter = 1:num_iters 
    gradJ = 1/m * (X'*X*theta - X'*y); 
    theta = theta - alpha * gradJ; 
    end 
end

迭代后的结果：

theta = 
    1.0e+05 * 

    3.3430 
    1.0009 
    0.0367

但现在，我试图做同样的没有矩阵乘法，这是功能：

function theta = gradientDescentMulti(X, y, theta, alpha, num_iters) 
    m = length(y); % number of training examples 
    n = size(X, 2); % number of features 

    for iter = 1:num_iters 
    new_theta = zeros(1, n); 
    %// for each feature, found the new theta 
    for t = 1:n 
     S = 0; 
     for example = 1:m 
     h = 0; 
     for example_feature = 1:n 
      h = h + (theta(example_feature) * X(example, example_feature)); 
     end 
     S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example 
     end 
     new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example 
    end 
    %// only at the end of the function, update all theta simultaneously 
    theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector) 
    end 
end

结果，所有的θ是相同：/

theta = 
    1.0e+04 * 

    3.5374 
    3.5374 
    3.5374

来源

2015-10-17 Arthur

如果你看看渐变更新规则，它可能更多有效地首先实际计算所有训练样例的假设，然后将其与每个训练样例的基础真值相减并将它们存储到数组或向量中。一旦你这样做，你就可以很容易地计算更新规则。对我来说，你并没有在你的代码中这样做。因此，我重写了代码，但我有一个单独的数组，用于存储每个训练示例和地面真值的假设中的差异。一旦我这样做，我计算更新规则分别各功能：

for iter = 1 : num_iters 

    %// Compute hypothesis differences with ground truth first 
    h = zeros(1, m); 
    for t = 1 : m 
     %// Compute hypothesis 
     for tt = 1 : n 
      h(t) = h(t) + theta(tt)*X(t,tt); 
     end 
     %// Compute difference between hypothesis and ground truth 
     h(t) = h(t) - y(t); 
    end 

    %// Now update parameters 
    new_theta = zeros(1, n);  
    %// for each feature, find the new theta 
    for tt = 1 : n 
     S = 0; 
     %// For each sample, compute products of hypothesis difference 
     %// and the right feature of the sample and accumulate 
     for t = 1 : m 
      S = S + h(t)*X(t,tt); 
     end 

     %// Compute gradient descent step 
     new_theta(tt) = theta(tt) - (alpha/m)*S; 
    end 

    theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)  

end

当我这样做，我得到的答案相同使用基质配方。

来源

2015-10-17 21:06:11 rayryeng

带有多个变量的矩阵的梯度下降

回答

相关问题