2017-08-08 83 views
0

我有一个文件,内容为:计数文字按日期蟒蛇

'2014-08-09':"a" 
'2014-08-09':"a" 
'2014-08-09':"b" 
'2014-09-09':"b" 
'2014-06-09':"b" 

我需要找到文本的数量按日期和下方则是O/P

2014-08-09-> a:2, b:1 
2014-09-09-> b:1 
2014-06-09-> b:1. 

以下是我的代码:

with open("file.txt") as file: 
my_list = file.readlines() 
result = {} 
for item in my_list: 
    posix_time = item.split(':')[0] 
    time_val = item.split(':')[1] 
    date_ext = datetime.datetime.fromtimestamp(
     int(posix_time) 
    ).strftime('%Y-%m-%d') 
    if time_val not in result: 
     result[time_val] = 0 
    else: 
     result[time_val] += 1 
+0

以何种方式根据需要不表现你的代码? –

+0

根据样本数据,您的期望输出是多少? – Alexander

+0

期望的输出是2014-08-09-> a:2,b:1 2014-09-09-> b:1 2014-06-09-> b:1,给出数据2014-08-09': “a”,2014-08-09':“a”,2014-08-09':“b”,, 2014-09-09':“b”,2014-06-09':“b”。 – shanky

回答

1

这里有一个简单的选择:

import datetime 
from collections import defaultdict 
In [30]: with open("dates.txt") as f: 
    ...:  res = defaultdict(dict) 
    ...:  for line in f.readlines(): 
    ...:   date, letter = line.rstrip().split(':') 
    ...:   letter = letter.replace("\"", "") 
    ...:   date = datetime.datetime.strptime(date, "'%Y-%m-%d'") 
    ...:   if letter in res[date]: 
    ...:    res[date][letter] += 1 
    ...:   else: 
    ...:    res[date][letter] = 1 

In [31]: res 
Out[31]: 
defaultdict(dict, 
      {datetime.datetime(2014, 6, 9, 0, 0): {'b': 1}, 
      datetime.datetime(2014, 8, 9, 0, 0): {'a': 2, 'b': 1}, 
      datetime.datetime(2014, 9, 9, 0, 0): {'b': 1}}) 

假设你想要的键作为datetime对象。否则,您可以删除该部分。

还是在defaultdict使用计数器而不是字典:

In [36]: with open("dates.txt") as f: 
    ...:  res = defaultdict(Counter) 
    ...:  for line in f.readlines(): 
    ...:   date, letter = line.rstrip().split(':') 
    ...:   letter = letter.replace("\"", "") 
    ...:   date = datetime.datetime.strptime(date, "'%Y-%m-%d'") 
    ...:   res[date].update({letter: 1}) 
    ...:   
    ...:   

In [37]: res 
Out[37]: 
defaultdict(collections.Counter, 
      {datetime.datetime(2014, 6, 9, 0, 0): Counter({'b': 1}), 
      datetime.datetime(2014, 8, 9, 0, 0): Counter({'a': 2, 'b': 1}), 
      datetime.datetime(2014, 9, 9, 0, 0): Counter({'b': 1})}) 

或由亚历山大所说,你可以使用lambda来创建复合默认字典。

In [38]: with open("dates.txt") as f: 
    ...:  res = defaultdict(lambda: defaultdict(int)) 
    ...:  for line in f.readlines(): 
    ...:   date, letter = line.rstrip().split(':') 
    ...:   letter = letter.replace("\"", "") 
    ...:   date = datetime.datetime.strptime(date, "'%Y-%m-%d'") 
    ...:   res[date][letter] += 1  

In [39]: res 
Out[39]: 
defaultdict(<function __main__.<lambda>>, 
      {datetime.datetime(2014, 6, 9, 0, 0): defaultdict(int, {'b': 1}), 
      datetime.datetime(2014, 8, 9, 0, 0): defaultdict(int, 
         {'a': 2, 'b': 1}), 
      datetime.datetime(2014, 9, 9, 0, 0): defaultdict(int, {'b': 1})}) 

这工作,因为int()等于0,这是我以前从来没有意识到,但它非常有意义。信

按日期排序,然后量:

In [64]: l = list(res.items()) 

In [65]: l 
Out[65]: 
[(datetime.datetime(2014, 8, 9, 0, 0), defaultdict(int, {'a': 2, 'b': 1})), 
(datetime.datetime(2014, 9, 9, 0, 0), defaultdict(int, {'b': 1})), 
(datetime.datetime(2014, 6, 9, 0, 0), defaultdict(int, {'b': 1}))] 

In [66]: l.sort(key=lambda x: (sum(x[1].values()), x[0])) 

In [67]: l 
Out[67]: 
[(datetime.datetime(2014, 6, 9, 0, 0), defaultdict(int, {'b': 1})), 
(datetime.datetime(2014, 9, 9, 0, 0), defaultdict(int, {'b': 1})), 
(datetime.datetime(2014, 8, 9, 0, 0), defaultdict(int, {'a': 2, 'b': 1}))] 
+1

您可以创建一个复合默认字典:'res = defaultdict(lambda:defaultdict(int))'。然后你可以增加它:'res [date] [letter] + = 1'。为了得到最终的词典:'{k:dict(res [k])for k in res}' – Alexander

+0

@亚历山大谢谢,我想知道你是怎么做到的。 –

+0

@CoryMadden如何按日期排序结果,然后计算文本的数量。我已经尝试了下面的代码:ordered_result = OrderedDict(sorted(result.items(),key = lambda t:(t [0],t [2]))) – shanky

0

您可以遍历数据并创建所需的结果。这使用ast.literal_evalquotes打开字符串文本字符串:

In []: 
from collections import defaultdict 
import datetime as dt 
import ast 

with open(<file>) as f: 
    data = [[ast.literal_eval(word) for word in line.split(':')] for line in f] 

result = {} 
for date, c in data: 
    date = dt.datetime.strptime(date, '%Y-%m-%d') 
    result.setdefault(date, defaultdict(int))[c] += 1 
result 

Out[]: 
{datetime.datetime(2014, 6, 9, 0, 0): defaultdict(int, {'b': 1}), 
datetime.datetime(2014, 8, 9, 0, 0): defaultdict(int, {'a': 2, 'b': 1}), 
datetime.datetime(2014, 9, 9, 0, 0): defaultdict(int, {'b': 1})}​ 
0

可以读取文件到列表,并使用字典的日期作为键,然后在每个键的值迭代算来,并打印出来,例如:

with open('file.txt', 'r') as f: 
    data = [line.rstrip().split(':') for line in f] 
    result = {} 
    for sub in data: 
     try: 
      result[sub[0].replace("'", '')] += sub[1].replace('"', '') 
     except KeyError: 
      result[sub[0].replace("'", '')] = sub[1].replace('"', '') 
    for k, v in result.iteritems(): # use result.items() for python 3 
     out = '' 
     out += '{}-> '.format(k) 
     for c in set(v): 
      out += '{}: {} '.format(c, v.count(c)) 
     print out 

输出:

2014-08-09-> a: 2 b: 1 
2014-06-09-> b: 1 
2014-09-09-> b: 1