用“一包字”的方法计算距离

我的代码运行但我的函数输出总是0.0。我的代码调用.txt文件并创建一个矩阵，其中每个.txt文件表示矩阵中的一行，并且.txt文件中的每个单词在矩阵的相应行中都有自己的列。用“一包字”的方法计算距离

我将两条线进行比较。我想要统计两行联合的每个词出现的频率。然而，尽管代码运行，我得到了错误的结果（0.0）。

我想我可能会在我用于该功能的矩阵中出现错误，但矩阵看起来不错。

奇怪的是，如果我手动创建到列表：

a = ["a", "b", "c", "d"], 
b = ["b", "c", "d", "e"]

它的工作原理，但是当我更改为：

a = ["word 1", "word 2", "word 3", "word 4"], 
b = ["word 2","word 3","word 4","word 5",]

结果再次0.0。我很困惑！

我的代码：

def bow_distance(a, b): 

    p = 0 

    if len(a) > len(b): 
     max_words = len(a) 
    else: 
     max_words = len(b) 

    list_words_ab = list(set(a) | set(b)) 

    len_bow_matrix = len(list_words_ab) 
    bow_matrix = numpy.zeros(shape = (3, len_bow_matrix), dtype = str) 

    while p < len_bow_matrix: 
     bow_matrix[0, p] = str(list_words_ab[p]) 
     p = p+1 

    p = 0 

    while p < len_bow_matrix: 
     bow_matrix[1, p] = a.count(bow_matrix[0, p]) 
     bow_matrix[2, p] = b.count(bow_matrix[0, p]) 
     p = p+1 

    p = 0 
    overlap = 0 

    while p < len_bow_matrix: 
     abs_difference = abs(float(bow_matrix[1, p]) - float(bow_matrix[2, p])) 
     overlap = overlap + abs_difference 
     p = p+1 

    return (overlap/2)/max_num_parts 


    # Calculate the distances 

i = 1 
j = 1 

while i < num_of_txt + 1: 

    print(i) 
    newfile = open("TXT_distance_" + str(i)+".txt", "w") 

    while j < num_of_txt + 1: 
     newfile.write(str(bow_distance(text_word_matrix[i-1], text_word_matrix[j-1])) + " ") 
     j = j+1 

    newfile.close() 
    j = 1 
    i = i+1

来源

2016-08-24 Philipp

对于第一次看到我在这里看到两次失败：

a = ["a", "b", "c", "d"], <----- comma here 
b = ["b", "c", "d", "e"] 
it works, but when I change to: 

a = ["word 1", "word 2", "word 3", "word 4"], <----- and here 
b = ["word 2","word 3","word 4","word 5",] <----- and here inside the list

来源

2016-08-24 14:37:21 turkus

还有后'“字5”'需要被去除的多余的逗号。 – Harrison

诚然，谢谢你。 – turkus

单词5之后的逗号并不重要，因为它可以在列表中以逗号结尾。然而，列表定义之后的逗号*（其中定义了'a'）会使'a'成为具有单个值（即数组本身）的元组，并且可能会抛弃您的逻辑。 – Riaz

用“一包字”的方法计算距离

回答

相关问题