2016-10-14 41 views
0

我已经做了一个函数,我计算了每个单词在文件中使用了多少次,也就是说单词频率。现在函数可以计算所有单词的总和,并向我显示七个最常用的单词以及它们被使用的次数。现在我想比较一下我的第一个文件是我用另一个文件分析了单词的频率是否有英文中使用的最常用单词,我想将这些单词与我在第一个文件中看到的单词进行比较任何单词匹配。字典列表和比较列表python

我得到的是制作两个文件的列表,然后将它们相互比较。但是我为此编写的代码并没有给出任何输出,关于如何解决这个问题的任何想法?

def CountWords(): 
filename = input('What is the name of the textfile you want to open?: ') 
if filename == "alice" or "alice-ch1.txt" or " ": 
    file = open("alice-ch1.txt","r") 
    print('You want to open alice-ch1.txt') 
    wordcount = {} 
    for word in file.read().split(): 
     if word not in wordcount: 
      wordcount[word] = 1 
     else: 
      wordcount[word] += 1           
    wordcount = {k.lower(): v for k, v in wordcount.items() } 
    print (wordcount) 

    sum = 0 
    for val in wordcount.values(): 
     sum += val 
    print ('The total amount of words in Alice adventures in wonderland: ' + str(sum)) 
    sortList = sorted(wordcount.values(), reverse = True) 
    most_freq_7 = sortList[0:7] 
    #print (most_freq_7) 
    print ('Totoro says: The 7 most common words in Alice Adventures in Wonderland:') 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[0])] + " " + str(most_freq_7[0])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[1])] + " " + str(most_freq_7[1])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[2])] + " " + str(most_freq_7[2])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[3])] + " " + str(most_freq_7[3])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[4])] + " " + str(most_freq_7[4])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[5])] + " " + str(most_freq_7[5])) 
    print(list(wordcount.keys())[list(wordcount.values()).index(most_freq_7[6])] + " " + str(most_freq_7[6])) 

    file_common = open("common-words.txt", "r") 
    commonwords = [] 
    contents = file_common.readlines() 

    for i in range(len(contents)): 
     commonwords.append(contents[i].strip('\n')) 
    print(commonwords) 

#From here's the code were I need to find out how to compare the lists: 
    alice_keys = wordcount.keys() 
    result = set(filter(set(alice_keys).__contains__, commonwords)) 
    newlist = list() 


    for elm in alice_keys: 
     if elm not in result: 
      newlist.append(elm) 
    print('Here are the similar words: ' + str(newlist)) #Why doesn't show? 


else: 
    print ('I am sorry, that filename does not exist. Please try again.')    

回答

0

我不在口译员面前,所以我的代码可能会稍微偏离。但尝试更多这样的事情。

from collections import Counter 
with open("some_file_with_words") as f_file 
    counter = Counter(f_file.read()) 
    top_seven = counter.most_common(7) 
    with open("commonwords") as f_common: 
    common_words = f_common.read().split() 
    for word, count in top_seven: 
     if word in common_words: 
     print "your word " + word + " is in the most common words! It appeared " + str(count) + " times!" 
+0

Thanks @ bravosierra99! – Allizon

+0

它出现为字符,虽然“你的单词e是最常用的单词....”而不是单词... – Allizon

+0

你的常用单词文件是如何设置的?我使用.split()这意味着单词需要用空格分隔。你必须调整这个代码,以确定你的常用单词文件是如何设置的。 – bravosierra99