2017-04-10 68 views
2

我有以下文本文件(您可以从here下载它)。Python - 用NLTK搜索文本

我试图在文件中搜索词language。对于这一点,我有以下Python脚本:

import nltk 

file = open('NLTK.txt', 'r') 
read_file = file.read() 
text = nltk.Text(read_file) 
match = text.concordance('language') 
print(match) 

然而,当我运行程序时,我得到下面的输出,虽然该文件包含单词language

No matches 
None 

为什么不能节目找到这个词language如果它存在于文件中?

编辑1

我注意到,声明text = nltk.Text(read_file)回报:

<Text: T h i s i s ...> 

感谢。

+0

接受的答案是关于如何解决这个问题是正确的,但这里的另一个忠告:不要打扰学习与'Text'类工作;它仅用于交互式探索和演示。直接进入'PlaintextCorpusReader'(和其对应的注释格式)。 – alexis

回答

4

我相信你需要首先使用标记来处理原始文本(as per ch3)。在您的示例文本中,Tokenizing然后处理给了我结果。

import nltk 

file = open('NLTK.txt', 'r') 
read_file = file.read() 
text = nltk.Text(nltk.word_tokenize(read_file)) 

match = text.concordance('language') 

或者,您可以使用NLTK语料库读者做这样的令牌化和处理;

import nltk 
from nltk.corpus import PlaintextCorpusReader 

corp = PlaintextCorpusReader(r'C:/', 'NLTK.txt') 
text = nltk.Text(corp.words()) 

match = text.concordance('language') 

匹配结果;

Displaying 18 of 18 matches: 
            Language Processing . By `` natural languag 
            language '' we mean a language that is used 
            language that is used for everyday communic 
licit rules . We will take Natural Language Processing ・or NLP for short ・in a 
f computer manipulation of natural language . At one extreme , it could be as 
ted access to stored information , language processing has come to play a cent 
e textbook for a course on natural language processing or computational lingui 
is based on the Python programming language together with an open source libra 
source library called the Natural Language Toolkit (NLTK) . NLTK includes e 
s are deployed in a variety of new language technologies . For this reason it 
rite programs that analyze written language , regardless of previous programmi 
is book to get immersed in natural language processing . All relevant Python f 
ty for this application area . The language index will help you locate relevan 
mples and dig into the interesting language analysis material that starts in 1 
text using Python and the Natural Language Toolkit . To learn about advanced 
an help you manipulate and analyze language data , and how to write these prog 
s are used to describe and analyse language How data structures and algorithms 
and algorithms are used in NLP How language data is stored in standard formats