我最近开始使用nltk模块进行文本分析。我被困在一个点上。我想在数据框上使用word_tokenize,以获取数据框特定行中使用的所有单词。如何在数据框中使用word_tokenize
data example:
text
1. This is a very good site. I will recommend it to others.
2. Can you please give me a call at 9983938428. have issues with the listings.
3. good work! keep it up
4. not a very helpful site in finding home decor.
expected output:
1. 'This','is','a','very','good','site','.','I','will','recommend','it','to','others','.'
2. 'Can','you','please','give','me','a','call','at','9983938428','.','have','issues','with','the','listings'
3. 'good','work','!','keep','it','up'
4. 'not','a','very','helpful','site','in','finding','home','decor'
基本上,我想分开所有单词并找到数据框中每个文本的长度。
我知道word_tokenize可以为它的字符串,但如何将它应用到整个数据框?
请帮忙!
在此先感谢...
您的问题描述缺少数据输入,您的代码,您期望的输出可以充实吗?谢谢 – EdChum
@EdChum:已编辑查询。希望它具有所需的信息。 – eclairs