2015-02-09 44 views
0

我想通过一个文本文件,并创建一个具有关键字词典的数量和他们弹出up.I希望它看起来就有点像这样时间:以整数形式defaultdict

defaultdict(<type 'int'>, {'keyword1': 1, 'keyword2': 0, 'keyword3': 3, 'keyword4': 9}) 

现在我得到的东西看起来是这样的:

defaultdict(<type 'int'>, {'keyword1': 1}) 

我可以打印每个关键词在我的字典里,它遍历虽然,所以我知道它的尝试的东西。我也知道更多的这些关键字应该弹出,他们应该在文本文件中有实例。我的代码:

find_it=['keyword1', 'keyword2', 'keyword3', 'keyword4'] 

with open('inputfile.txt', 'r') as f: 
    out = defaultdict(int) 

    for key in find_it: 
     counter=0 
     for line in f: 
      if key in line: 
       out[key] += 1 

my_keys=dict(**out) 

我在这里错过了什么?

回答

1

Joran是正确的,一个Counter是你正在做的事情比defaultdict更适合。下面是一个替代的解决方案:

inputfile.txt

The Zen of Python, by Tim Peters 

Beautiful is better than ugly. 
Explicit is better than implicit. 
Simple is better than complex. 
Complex is better than complicated. 
Flat is better than nested. 
Sparse is better than dense. 
Readability counts. 
Special cases aren't special enough to break the rules. 
Although practicality beats purity. 
Errors should never pass silently. 
Unless explicitly silenced. 
In the face of ambiguity, refuse the temptation to guess. 
There should be one-- and preferably only one --obvious way to do it. 
Although that way may not be obvious at first unless you're Dutch. 
Now is better than never. 
Although never is often better than *right* now. 
If the implementation is hard to explain, it's a bad idea. 
If the implementation is easy to explain, it may be a good idea. 
Namespaces are one honking great idea -- let's do more of those! 

count.py

from collections import Counter 

find_it = {"be", "do", "of", "the", "to"} 

keys = Counter() 

with open("inputfile.txt") as f: 
    for line in f: 
     matches = Counter(w for w in line.split() if w in find_it) 
     keys += matches 

print(keys) 
$ python count.py 
Counter({'the': 5, 'to': 5, 'be': 3, 'of': 3, 'do': 2}) 

此发现在每一行对find_it匹配的数量,并将它们添加到运行计数器keys随着它的发展。

编辑:正如Blckknght在评论中指出的那样,以前的解决方案错过了一个关键字在一行中多次出现的情况。编辑后的代码版本使用与以前稍微不同的方法来解决该问题。

+0

它值得注意的是这会计算每个单词出现的行数(如果问题有效,问题代码也会计算在内)。这可能是也可能不是理想的.Joran Beasley的代码,相比之下,将统计每个单词的出现次数,而不管它们出现在哪条线上(所以像''keyword1 keyword2 keyword1'''这样的行会增加' “关键字1”由两个)。 – Blckknght 2015-02-09 22:00:15

+0

@Blckknght好赶上!现在修复:-) – 2015-02-09 22:23:03

3
from collections import Counter 
my_current_count = Counter(open('inputfile.txt').read().split()) 

应该做的......和更简单的

for shared_key in set(my_current_count).intersection(my_list_of_keywords): 
    print my_current_count[shared_key] 
在当前状态下

有太多的事情要做你原来的方法,使其工作,仍然是识别

+0

这也看起来很有趣,我将研究这种方法。谢谢 – 2015-02-09 20:39:23

3

你已经在for key in find_it:的第一次迭代中读取文件中的所有内容,因此对于下一个键,没有任何可读的内容。

我建议你交换这些for循环。

with open('inputfile.txt', 'r') as f: 
    out = defaultdict(int) 

    for line in f: 
     for key in find_it: 
      if key in line.strip().split(' '): 
       out[key] += 1 

顺便说一句,我强烈推荐你去与Joran Beasley's一个在线解决方案,因为它更容易阅读和理解的人谁都会看在未来你的代码。

+0

不应该通过数组来找到它吗? – 2015-02-09 20:29:57

+0

它会遍历每一行中的每一个单词,看看它们是否在'find_it'中 – ozgur 2015-02-09 20:32:45

+0

哦,我看到了,因为一旦你经历了一条线,你就无法回到它与他们关键字的方式我没有!谢谢! – 2015-02-09 20:33:01