word counter || python

-1

我想打印1-20个字母的txt文件中的单词数量。尝试过，但它打印20个零。任何想法？word counter || python

编辑 - 最后程序应该绘制20个数字，每个数字是文件中包含1-20个字母的字数。

fin = open('words.txt') 
for i in range(20): 
    counter = 0 
    for line in fin: 
     word = line.strip() 
     if len(word) == i: 
      counter = counter + 1 
    print counter,

来源

2017-02-25 Jonathan

程序逻辑是完全向后。而不是迭代文件并查找20个字符的单词，而是遍历该文件20次。 –

你正在寻找一个数字（字数不超过20个字符）或20个数字（每个潜在的长度，有多少字）？ – Mureinik

寻找20个数字，这意味着第一个数字是文件中包含1个字母的字数，第二个数字是包含2个字母等字的数量... – Jonathan

它应该是这样的，counter不应该在for循环，你可以使用len()方法获取字的长度：

with open("test") as f: 
    counter = 0 
    for line in f: 
     for word in line.split(): 
      if len(word)<=20: 
       counter+=1 
    print counter

还是我的方法：

import re 

with open("file") as f: 
    print len(filter(lambda x:len(x)<20,re.split('\n| ', f.read())))

希望这会有所帮助。

来源

2017-02-25 10:40:01 McGrady

编辑

为了生产单独计数每个单词的长度，你可以使用一个collections.Counter：

from collections import Counter 

def word_lengths(f): 
    for line in f: 
     for word in line.split(): # does not ignore punctuation 
      yield len(word) 

with open('words.txt') as fin:   
    counts = Counter(length for length in word_lengths(fin) if length <= 20)

它使用发电机来读取文件并产生字长的序列。过滤的字长被输入Counter。您可以改为在Counter上执行长度过滤。

如果您想忽略标点符号，可以使用str.translate()删除不需要的字符，或者可能使用re.split(r'\W+', line)而不是line.split()。

试试这样说：

with open('words.txt') as fin: 
    counter = 0 
    for line in fin: 
     for word in line.split(): 
      if len(word) <= 20: 
       counter = counter + 1 
    print counter,

这可以简化为：

with open('words.txt') as fin: 
    counter = sum([1 for line in fin 
         for word in line.split() if len(word) <= 20])

但是这打码高尔夫球。

你也可以使用一个collections.Counter，如果它是可行的整个文件读入内存：

from collections import Counter 

with open('words.txt') as fin: 
    c = Counter(fin.read().split()) 
    counter = sum(c[k] for k in c if len(k) <= 20)

毫无疑问，还有很多其他的方法来做到这一点。以上都不期望或处理标点符号。

来源

2017-02-25 10:40:20 mhawke

尝试第一个你发送（没有学过所有关键字在第二），给了我一个数字而不是20.（我评论它），有什么想法什么是错的？ – Jonathan

没什么不对，只是您在提出问题后澄清了您的要求。你应该在问题中作出澄清。 – mhawke

@Jonathan：回答更新以产生多个计数。 – mhawke

使用正则表达式

import re 

REGEX = r"(\b\S{1,20}\b)" 
finder = re.compile(REGEX) 

with open("words.txt") as out: 
    data = out.read() 

matches = re.findall(finder, data) 

lst = [0 for _ in range(20)] 

for m in matches: 
    lst[len(m)] += 1 

print(lst)

来源

2017-02-25 11:01:49 Crispin

我根据OP提供的明确要求编辑了我的回复 – Crispin

word counter || python

回答

相关问题