我不能想出停止词和string.punctuation为什么这不工作:去除
import nltk
from nltk.corpus import stopwords
import string
with open('moby.txt', 'r') as f:
moby_raw = f.read()
stop = set(stopwords.words('english'))
moby_tokens = nltk.word_tokenize(moby_raw)
text_no_stop_words_punct = [t for t in moby_tokens if t not in stop or t not in string.punctuation]
print(text_no_stop_words_punct)
查看输出我有这样的:
[...';', 'surging', 'from', 'side', 'to', 'side', ';', 'spasmodically', 'dilating', 'and', 'contracting',...]
似乎标点符号还在那儿。我做错了什么?