2017-04-07 78 views
1

我在python中创建了一个简单的word计数程序,它读取文本文件,计算词频并将结果写入另一个文件。 现在的问题是,如果我想搜索“窗口”和文本文件包含一个单词“xwindows”,那么它也算它。Python:文件中的词频

import sys 
import glob 
import errno 
files = glob.glob('w.asm') 
the_count =['windows'] 
for name in files: 
    with open(name) as f: 
     print "Occurences in file -- %s " % name 
     contents = f.read() 
     print contents 
     for number in the_count: 
      print "windows occured-", contents.count(number) 

w.asm文件包含

windows 
iwindows 
qwindows 
hwindows 
kwindows 
windows 
windows 
windowsh 
wwindows 
windows 
iwindows 
qwindows 
hwindows 
kwindows 

输出

Occurences in file -- w.asm 

windows 
iwindows 
qwindows 
hwindows 
kwindows 
windows 
windows 
windowsh 
wwindows 
windows 
iwindows 
qwindows 
hwindows 
kwindows 
windows occured- 14 

所以我想实际输出为4,因为窗户居然发生了4次,但代码是给14 .. ..

所以请帮忙

回答

0

14实际上是正确的,因为windowsh等包含子字符串winows。一个简单的解决方法是首先用文字分割文件,然后致电count()

for name in files: 
    with open(name) as f: 
     print "Occurences in file -- %s " % name 
     contents = f.read().split() # <--- split 
     print contents 
     for number in the_count: 
      print "windows occured-", contents.count(number) 
+0

非常感谢你的工作 –