2013-03-18 83 views
5

我想R中使用随机梯度下降,以建立自己的回归函数,但我现在所拥有的使权重成长过程中没有约束,因此从来没有停止:R中回归公式的执行

# Logistic regression 
# Takes training example vector, output vector, learn rate scalar, and convergence delta limit scalar 
my_logr <- function(training_examples,training_outputs,learn_rate,conv_lim) { 
    # Initialize gradient vector 
    gradient <- as.vector(rep(0,NCOL(training_examples))) 
    # Difference between weights 
    del_weights <- as.matrix(1) 
    # Weights 
    weights <- as.matrix(runif(NCOL(training_examples))) 
    weights_old <- as.matrix(rep(0,NCOL(training_examples))) 

    # Compute gradient 
    while(norm(del_weights) > conv_lim) { 

    for (k in 1:NROW(training_examples)) { 
     gradient <- gradient + 1/NROW(training_examples)* 
     ((t(training_outputs[k]*training_examples[k,] 
      /(1+exp(training_outputs[k]*t(weights)%*%as.numeric(training_examples[k,])))))) 
    } 

    # Update weights 
    weights <- weights_old - learn_rate*gradient 
    del_weights <- as.matrix(weights_old - weights) 
    weights_old <- weights 

    print(weights) 
    } 
    return(weights) 
} 

的功能可以用下面的代码进行测试:

data(iris) # Iris data already present in R  
# Dataset for part a (first 50 vs. last 100) 
iris_a <- iris 
iris_a$Species <- as.integer(iris_a$Species) 
# Convert list to binary class 
for (i in 1:NROW(iris_a$Species)) {if (iris_a$Species[i] != "1") {iris_a$Species[i] <- -1}}  
random_sample <- sample(1:NROW(iris),50) 

weights_a <- my_logr(iris_a[random_sample,1:4],iris_a$Species[random_sample],1,.1) 

我双重检查我的针对Abu-Mostafa's算法,其如下:

  1. 初始化权重向量
  2. 对于每个时间段计算梯度:
    gradient <- -1/N * sum_{1 to N} (training_answer_n * training_Vector_n/(1 + exp(training_answer_n * dot(weight,training_vector_n))))
  3. weight_new <- weight - learn_rate*gradient
  4. 重复,直到体重增量足够小

我失去了一些东西在这里?

+0

我是否缺少权重的标准化术语?这是一个交叉验证的问题,也许? – 2013-03-18 13:50:48

+1

从数学的角度来看,权重向量的无约束幅度不会产生独特的解决方案。 - 权重/规范(权重)' ... '的权重< - weights_old - learn_rate * gradient' '权重 '权重<:当我加入这两行分类器函数,它在两个步骤会聚< - 权重/规范(权重)' – 2013-03-18 13:58:24

+0

下面的答案有帮助吗? – 2013-03-18 16:43:21

回答

2

从数学的角度来看,所述权重向量的无约束大小不产生独特的解决方案。当我将这两行的分类功能,它融合在两个步骤:

# Normalize 
weights <- weights/norm(weights) 

...

# Update weights 
weights <- weights_old - learn_rate*gradient 
weights <- weights/norm(weights) 

我不能让@ SimonO101的工作,我没有使用此真正的工作代码(有嵌入像glm),所以它足以做我理解的循环。 整个功能如下:

# Logistic regression 
# Takes training example vector, output vector, learn rate scalar, and convergence delta limit scalar 
my_logr <- function(training_examples,training_outputs,learn_rate,conv_lim) { 
    # Initialize gradient vector 
    gradient <- as.vector(rep(0,NCOL(training_examples))) 
    # Difference between weights 
    del_weights <- as.matrix(1) 
    # Weights 
    weights <- as.matrix(runif(NCOL(training_examples))) 
    weights_old <- as.matrix(rep(0,NCOL(training_examples))) 

    # Normalize 
    weights <- weights/norm(weights) 

    # Compute gradient 
    while(norm(del_weights) > conv_lim) { 

    for (k in 1:NCOL(training_examples)) { 
     gradient <- gradient - 1/NROW(training_examples)* 
     ((t(training_outputs[k]*training_examples[k,] 
      /(1+exp(training_outputs[k]*t(weights)%*%as.numeric(training_examples[k,])))))) 
    } 
#  gradient <- -1/NROW(training_examples) * sum(training_outputs * training_examples/(1 + exp(training_outputs * weights%*%training_outputs))) 

    # Update weights 
    weights <- weights_old - learn_rate*gradient 
    weights <- weights/norm(weights) 
    del_weights <- as.matrix(weights_old - weights) 
    weights_old <- weights 

    print(weights) 
    } 
    return(weights) 
} 
+0

+1。看起来不错。我没有完全理解算法过程。我今天晚上开始谈论这件事。 – 2013-03-19 13:30:22

+0

你是什么意思“权重”?你如何计算标准误差和p值?你能给这个算法的一些参考吗?谢谢! – qed 2014-03-30 19:00:15

+1

我也试图在C++中实现logistic回归,但是使用IRLS算法,涉及到矩阵反转,有时甚至令人头疼。 – qed 2014-03-30 19:01:52

1

有几个问题。首先,你可以更好地使用R的矢量化方法。其次,我不是随机梯度下降的专家,但是您在问题下面给出的算法与您在函数中如何计算梯度不相符。仔细检查这个代码,但它似乎收敛,并且我认为它遵循Abu-Mostfafa's。我收集你想要计算这个梯度;

gradient <- -1/N * sum(training_outputs * training_examples/(1 + exp(training_outputs * dot(weights ,training_outputs)))) 

那么你的算法应该阅读这部分...

while(norm(del_weights) > conv_lim) { 
gradient <- -1/NROW(iris_a) * sum(training_outputs * training_examples/(1 + exp(training_outputs * as.matrix(training_examples) %*% weights))) 

# Update weights 
weights <- weights_old - learn_rate*gradient 
del_weights <- as.matrix(weights_old - weights) 
weights_old <- weights 
print(weights) 

}

您可以创建更容易物种变量的二元分类使用:

iris_a$Species <- as.numeric(iris_a$Species) 
iris_a$Species[ iris_a$Species != 1 ] <- -1  

我无法告诉您返回的结果是否合理,但该代码应该按照步骤2进行。检查每个步骤仔细,记住R是矢量化的,所以你可以在没有循环的矢量上做元素明智的操作。例如为:

x <- 1:5 
y <- 1:5 
x*y 
#[1] 1 4 9 16 25 
+0

嗯,这有很大的帮助(我也掩盖了其他二进制虹膜设置不正确,这里没有显示),但现在我得到了另一个奇怪的结果,这是每个分类器都训练相同的方式:(http ://i.imgur.com/qvamVQW.png) – 2013-03-19 00:23:41

+0

当回归函数返回时,看起来所有权重都是相同的。你确定这些乘法是正确的吗? – 2013-03-19 00:32:12

+1

@TrevorAlexander嗨,没有那些乘法是不正确的,很好的发现。我看到你发布了一个可行的解决方案,但我想回去梳理矢量化代码并使其工作,因为这是一个很好的习惯! – 2013-03-19 13:29:45