2015-12-03 95 views
3

所以我有一个文本文件中的单词列表。我想对它们进行词形化,以消除具有相同含义但处于不同时态的词。像尝试,试图等等。当我这样做,我不断收到类似类型错误的错误:unhashable类型:“名单”单词列表的词形化

results=[] 
    with open('/Users/xyz/Documents/something5.txt', 'r') as f: 
     for line in f: 
      results.append(line.strip().split()) 

    lemma= WordNetLemmatizer() 

    lem=[] 

    for r in results: 
     lem.append(lemma.lemmatize(r)) 

    with open("lem.txt","w") as t: 
     for item in lem: 
     print>>t, item 

如何lemmatize话这已经是令牌?

回答

4

方法WordNetLemmatizer.lemmatize可能期望一个字符串,但是您将它传递给一个字符串列表。这给你TypeError例外。

line.split()的结果是一个字符串列表,您将作为列表追加到results列表中,即列表列表。

你想用results.extend(line.strip().split())

results = [] 
with open('/Users/xyz/Documents/something5.txt', 'r') as f: 
    for line in f: 
     results.extend(line.strip().split()) 

lemma = WordNetLemmatizer() 

lem = map(lemma.lemmatize, results) 

with open("lem.txt", "w") as t: 
    for item in lem: 
     print >> t, item 

或重构没有中间结果列表

def words(fname): 
    with open(fname, 'r') as document: 
     for line in document: 
      for word in line.strip().split(): 
       yield word 

lemma = WordNetLemmatizer() 
lem = map(lemma.lemmatize, words('/Users/xyz/Documents/something5.txt')) 
1
Open a text file and and read lists as results as shown below 
fo = open(filename) 
results1 = fo.readlines() 

results1 
['I have a list of words in a text file', ' \n I want to perform lemmatization on them to remove words which have the same meaning but are in different tenses', ''] 

# Tokenize lists 

results2 = [line.split() for line in results1] 

# Remove empty lists 

results2 = [ x for x in results2 if x != []] 

# Lemmatize each word from a list using WordNetLemmatizer 

from nltk.stem.wordnet import WordNetLemmatizer 
lemmatizer = WordNetLemmatizer() 
lemma_list_of_words = [] 
for i in range(0, len(results2)): 
    l1 = results2[i] 
    l2 = ' '.join([lemmatizer.lemmatize(word) for word in l1]) 
    lemma_list_of_words.append(l2) 
lemma_list_of_words 
['I have a list of word in a text file', 'I want to perform lemmatization on them to remove word which have the same meaning but are in different tense'] 

Please look at the lemmatized difference between lemma_list_of_words and results1. 
+0

请说明您的代码 – GGO