2017-04-21 211 views
0

我是一个绝对的Python初学者。我正在对希腊语剧本进行文本分析并计算每个单词的单词频率。由于游戏时间很长,我无法看到我的全套数据,它只显示频率最低的单词,因为Python窗口中没有足够的空间。我正在考虑将其转换为.csv文件。我的完整代码如下:如何将字典值转换成csv文件?

#read the file as one string and spit the string into a list of separate words 
input = open('Aeschylus.txt', 'r') 
text = input.read() 
wordlist = text.split() 

#read file containing stopwords and split the string into a list of separate words 
stopwords = open("stopwords .txt", 'r').read().split() 

#remove stopwords 
wordsFiltered = [] 

for w in wordlist: 
    if w not in stopwords: 
     wordsFiltered.append(w) 

#create dictionary by counting no of occurences of each word in list 
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered] 

#create word-frequency pairs and create a dictionary 
dictionary = dict(zip(wordsFiltered,wordfreq)) 

#sort by decreasing frequency and print 
aux = [(dictionary[word], word) for word in dictionary] 
aux.sort() 
aux.reverse() 
for y in aux: print y 

import csv 


with open('Aeschylus.csv', 'w') as csvfile: 
    fieldnames = ['dictionary[word]', 'word'] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 


    writer.writeheader() 
    writer.writerow({'dictionary[word]': '1', 'word': 'inherited'}) 
    writer.writerow({'dictionary[word]': '1', 'word': 'inheritance'}) 
    writer.writerow({'dictionary[word]': '1', 'word': 'inherit'}) 

我在网上找到了csv的代码。我希望得到的是从最高频率到最低频率的完整数据列表。现在使用这段代码,python似乎完全忽略了csv部分,只是打印数据,就好像我没有编写csv代码一样。

有关我应该编码以查看预期结果的任何想法?

谢谢。

+0

您需要关闭文件 – DrBwts

回答

0

既然你有一本字典,其中的单词是键和它们的频率值,DictWriter是不合适的。对于映射序列来说,它们共享一些常用密钥集合,用作csv的列。例如,如果你已经有了类型的字典列表,例如手动创建:

a_list = [{'dictionary[word]': '1', 'word': 'inherited'}, 
      {'dictionary[word]': '1', 'word': 'inheritance'}, 
      {'dictionary[word]': '1', 'word': 'inherit'}] 

那么DictWriter将是工作的工具。而是你有一个像dictionary

dictionary = {'inherited': 1, 
       'inheritance': 1, 
       'inherit': 1, 
       ...: ...} 

但是,你已经建立的(freq, word)对排序列表作为aux,这是完美的写入CSV:

with open('Aeschylus.csv', 'wb') as csvfile: 
    header = ['frequency', 'word'] 
    writer = csv.writer(csvfile) 
    writer.writerow(header) 
    # Note the plural method name 
    writer.writerows(aux) 

蟒蛇似乎完全忽略了csv部分,只是打印数据,就好像我没有为csv编写代码一样。

听起来很奇怪。至少你应该已经得到了一个文件Aeschylus.csv包含:

dictionary[word],word 
1,inherited 
1,inheritance 
1,inherit 

你的频率计数方法也得到改善。目前

#create dictionary by counting no of occurences of each word in list 
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered] 

必须通过列表wordsFiltered循环为wordsFiltered每个字,所以O(N²)。您可以反复遍历文件中的单词,筛选并计数。 Python有一个专门的字典,用于计算哈希的对象称为Counter

from __future__ import print_function 
from collections import Counter 
import csv 

# Many ways to go about this, could for example yield from (<gen expr>) 
def words(filelike): 
    for line in filelike: 
     for word in line.split(): 
      yield word 

def remove(iterable, stopwords): 
    stopwords = set(stopwords) # O(1) lookups instead of O(n) 
    for word in iterable: 
     if word not in stopwords: 
      yield word 

if __name__ == '__main__': 
    with open("stopwords.txt") as f: 
     stopwords = f.read().split() 

    with open('Aeschylus.txt') as wordfile: 
     wordfreq = Counter(remove(words(wordfile), stopwords)) 

然后,像以前一样,打印的字和它们的频率,从最常见的开始:

for word, freq in wordfreq.most_common(): 
     print(word, freq) 

和/或写为csv:

# Since you're using python 2, 'wb' and no newline='' 
    with open('Aeschylus.csv', 'wb') as csvfile: 
     writer = csv.writer(csvfile) 
     writer.writerow(['word', 'freq']) 
     # If you want to keep most common order in CSV as well. Otherwise 
     # wordfreq.items() would do as well. 
     writer.writerows(wordfreq.most_common()) 
+0

如果我想保留旧代码呢?我真的很陌生,所以我无法理解你为我写的 - 它完美的作品。我想知道如何,只需使用旧代码,我可以将数据写入csv? –

+0

我很抱歉让你心烦意乱。但它仍然是一个'DictWriter'不适合你的数据。虽然你有一本字典,但你最好用'csv.writer'编写'sorted(dictionary.items(),key = itemgetter(1),reverse = True)'。我稍后会更新答案。 –

+0

嗨,我试过用'writer.writerows(aux)'编辑的代码,但是Python仍然没有将数据放入.csv文件。我只有两个标题,分别是'字典'和'字'。它可能被链接到关闭另一个评论者提到的文件? –