在多个文件中计算不同的字符串

我想在我的路径/ test /中的文件列表（.txt）中计算一个笑脸列表。在多个文件中计算不同的字符串

这是我的方法来计算所有文件中的笑脸。

def count_string_occurrence(): 
     import os 
     total = 0 
     x = 0 
     for file in os.listdir("C:/users/M/Desktop/test"): 
       if file.endswith(".txt"): 
        string = ":)" #define search term 
        f=open(file,encoding="utf8") 
        contents = f.read() 
        f.close() 
        x=contents.count(string) 
        total +=int(x) #calculate occurance of smiley in all files 
     print("Number of " + string + " in all files equals " + str(total)) 

    count_string_occurrence()

我现在循环不同的表情和如何打印每个笑脸seperately结果呢？由于我已经通过不同的文件循环，它变得复杂。

来源

2017-04-18 M. H.

你说的是什么意思你想计算表情符号像'：D'，';）'，'：）'等等？ – blacksite

我的意思是我想让脚本计算大约20个笑脸的数量，并输出每个文件中“所有文件中X的数量等于___________”（X =笑脸）。笑脸包括:)，:-)，：]和一些正面和负面笑脸的变化。 –

关于你的问题：你可以保留一个字典，每个字符串的计数并返回它。但是如果你保持现有的结构，跟踪它不会很好。

这导致我的建议：

你保持整个文件在内存中没有明显的原因，你可以通过它逐行检查字符串当前行。
您也多次阅读相同的文件，而您只能阅读一次，并检查字符串是否存在。
您正在检查文件的扩展名，这听起来像是glob的作业。
您可以使用defaultdict，因此您不需要关心计数是否最初为0。

修改后的代码：

from collections import defaultdict 
import glob 

SMILIES = [':)', ':P', '=]'] 

def count_in_files(string_list): 
    results = defaultdict(int) 
    for file_name in glob.iglob('*.txt'): 
     print(file_name) 
     with open(file_name) as input_file: 
      for line in input_file: 
       for s in string_list: 
        if s in line: 
         results[s] += 1 
    return results 

print(count_in_files(SMILIES))

最后，使用这种方法，如果你使用的是Python> = 3.5，则可以更改glob调用for file_name in glob.iglob('**/*.txt', recursive=True)所以它会递归搜索，以防你需要它。

这将打印出类似这样： “循环不同的表情符号”

defaultdict(<class 'int'>, {':P': 2, ':)': 1, '=]': 1})

来源

2017-04-18 15:36:01 ChatterOne

谢谢，这种方法奏效！ :-)它确实比旧的要快得多。 –

您可以将您的搜索字符串作为函数参数，然后用不同的搜索词多次调用您的函数。

def count_string_occurrence(string): 
    import os 
    total = 0 
    x = 0 
    for file in os.listdir("C:/users/M/Desktop/test"): 
     if file.endswith(".txt"): 
      f=open(file,encoding="utf8") 
      contents = f.read() 
      f.close() 
      x=contents.count(string) 
      total +=int(x) #calculate occurance of smiley in all files 
    return total 

smilies = [':)', ':P', '=]'] 
for s in smilies = 
    total = count_string_occurrence(s) 
    print("Number of {} in all files equals {}".format(s, total))

一种不同的方法是通过表情给你的函数列表，然后执行if块内的迭代。也许将结果存储在一个字典中{ ':)': 5, ':P': 4, ... }

来源

2017-04-18 14:48:22

在多个文件中计算不同的字符串

回答

相关问题