2011-02-28 305 views
1

试图编写简单的python脚本,它将使用NLTK在txt文件中查找和替换同义词。使用WordNet和NLTK替换语料库中的同义词 - python

下面的代码给我错误:

Traceback (most recent call last): 
    File "C:\Users\Nedim\Documents\sinon2.py", line 21, in <module> 
    change(word) 
    File "C:\Users\Nedim\Documents\sinon2.py", line 4, in change 
    synonym = wn.synset(word + ".n.01").lemma_names 
TypeError: can only concatenate list (not "str") to list 

这里是代码:

from nltk.corpus import wordnet as wn 

def change(word): 
    synonym = wn.synset(word + ".n.01").lemma_names 

    if word in synonym: 

      filename = open("C:/Users/tester/Desktop/test.txt").read() 
      writeSynonym = filename.replace(str(word), str(synonym[0])) 
      f = open("C:/Users/tester/Desktop/test.txt", 'w') 
      f.write(writeSynonym) 
      f.close() 

f = open("C:/Users/tester/Desktop/test.txt") 
lines = f.readlines() 

for i in range(len(lines)): 

    word = lines[i].split() 
    change(word) 

回答

1

两件事情。首先,你可以更改文件的阅读部分:

for line in open("C:/Users/tester/Desktop/test.txt"): 
    word = line.split() 

第二,.split()返回一个字符串列表,而你的change功能出现在时间上一个字,只运行。这是什么导致异常。您的word实际上是一个列表。

如果你想在该行的每一个字处理,使其看起来像:

for line in open("C:/Users/tester/Desktop/test.txt"): 
    words = line.split() 
    for word in words: 
     change(word) 
+0

这是行不通的。不知道什么是错的 – Tester 2011-03-01 10:25:57

2

这并不十分有效,而这不会取代单一的代名词。因为每个单词可能有多个同义词。你可以从中选择,

from nltk.corpus import wordnet as wn 
from nltk.corpus.reader.plaintext import PlaintextCorpusReader 


corpus_root = 'C://Users//tester//Desktop//' 
wordlists = PlaintextCorpusReader(corpus_root, '.*') 


for word in wordlists.words('test.txt'): 
    synonymList = set() 
    wordNetSynset = wn.synsets(word) 
    for synSet in wordNetSynset: 
     for synWords in synSet.lemma_names: 
      synonymList.add(synWords) 
    print synonymList 
相关问题