我想从我的文件中的数据列中删除停用词。 我过滤了最终用户讲话时的线路。 但它并没有过滤出与usertext.apply(lambda x: [word for word in x if word not in stop_words])
停止词我做错了什么?从文件中删除停用词
import pandas as pd
from stop_words import get_stop_words
df = pd.read_csv("F:/textclustering/data/cleandata.csv", encoding="iso-8859-1")
usertext = df[df.Role.str.contains("End-user",na=False)][['Data','chatid']]
stop_words = get_stop_words('dutch')
clean = usertext.apply(lambda x: [word for word in x if word not in stop_words])
print(clean)
first can y ou 1)打印'stop_words',2)尝试'clean = usertext.apply(lambda x:[])'看它是否删除所有单词? (只是测试) –
Data [] chatid [] dtype:object ['aan','al','alles','als','altijd','andere','ben','bij' ,'dar','dan','dat','de','der','deze','die','dit','doch','doen','door' een',eens,en,er,ge,geen,geweest,haar,had,heb,hebben,heeft, ,'het','hier','hij','hoe','hun','iemand','iets','ik','in','是','ja','je',' kan'kon'kunnen'maar'me''meer''men''met'mij'mijn'moet'na'naar' ,'niet','niets','nog','nu','of','om','omdat',...]这是 – DataNewB