1

我建立一个程序来比较每个促销码(可能包含OCR错误)在列表中的所有促销码在另一份清单(正确的促销代码的列表)类型错误:unhashable“名单”

预期输出为编辑距离以及与正在比较的编辑距离最小的促销代码。

我的代码

import csv 
from nltk.metrics import distance 

with open("all_correct_promo.csv","rb") as file1: 
    reader1 = csv.reader(file1) 
    correctPromoList = list(reader1) 
    #print correctPromoList 

with open("all_extracted_promo.csv","rb") as file2: 
    reader2 = csv.reader(file2) 
    extractedPromoList = list(reader2) 
    #print extractedPromoList 

def find_min_edit(str_,list_): 
    nearest_correct_promos = [] 
    distances = {} 
    min_dist = 100 # arbitrary large assignment 
    for correct_promo in list_: 
     dist = distance.edit_distance(extracted,correct_promo,True) # compute Levenshtein distance 
     distances[correct_promo] = dist # store each score for real promo codes 
     if dist<min_dist: 
      min_dist = dist # store min distance 
    # extract all real promo codes with minimum Levenshtein distance 
    nearest_correct_promos.append(','.join([i[0] for i in distances.items() if i[1]==min_dist])) 
    return ','.join(nearest_correct_promos) # return a comma separated string of nearest real promo codes 

incorrectPromo = {} 
count = 0 
for extracted in extractedPromoList: 
    print 'Computing %dth promo code...' % count 
    incorrectPromo[extracted] = find_min_edit(extracted,correctPromoList) # get comma separated str of real promo codes nearest to extracted 
    count+=1 
print incorrectPromo 

预期输出

Computing 0th promo code... 
Computing 1th promo code... 
Computing 2th promo code... 
{'abc': 'abc', 'abd': 'abx,aba,abz,abc', 'acd': 'abx,aba,abz,abc'} 

但是,我的代码表示下列错误

Computing 0th promo code... 

Traceback (most recent call last): 

    File "correctpromo_test4.py", line 31, in <module> 

    incorrectPromo[extracted] = find_min_edit(extracted,correctPromoList) # get 
comma separated str of real promo codes nearest to extracted 

File "correctpromo_test4.py", line 20, in find_min_edit 

    distances[correct_promo] = dist # store each score for real promo codes 

TypeError: unhashable type: 'list' 
+1

列表不能用作字典键。一个简单的解决方法是将其从列表更改为元组。 –

+0

你可以添加样本输入吗?这个代码适用于我这些输入 - 'extractedPromoList = ['abc','acd','abd']#csv提取的促销代码dummy correctPromoList = ['abc','aba','xbz',' abz','abx']#csv到真正的促销代码dummy' –

回答

0

您正在读取CSV作为列表的列表 - 函数find_min_edit()期望将字符串列表作为其第二个参数;你传递的是一串字符串列表。

改变你读的CSV文件与排序这个东西的出路 -

而不是

with open("all_correct_promo.csv","rb") as file1: 
    reader1 = csv.reader(file1) 
    correctPromoList = list(reader1) 

只要使用这个

with open("all_correct_promo.csv","rb") as file1: 
    reader1 = csv.reader(file1) 
    correctPromoList = [''.join(i) for i in reader1] 
    print correctPromoList 

这样做对双方CSVs,这将整理出来...