2017-10-09 72 views
2

我有一个字典,其中的键是datetime.datetime &这些值是推文列表。所以它看起来像这样:在datetime对象中过滤日期月

{datetime.datetime(2017, 9, 30, 19, 55, 20) : ['this is some tweet text'], 
datetime.datetime(2017, 9, 30, 19, 55, 20) : ['this is another tweet']... 

我试图得到一年中每个月发出的推文的数量。到目前为止,我有...

startDate = 10 
endDate= 11 
start = True 
while start: 

    for k,v in tweetDict.items(): 
     endDate-=1 
     startDate-=1 

     datetimeStart = datetime(2017, startDate, 1) 
     datetimeEnd = datetime(2017,endDate, 1) 

     print(datetimeStart, datetimeEnd) 

     if datetimeStart < k < datetimeEnd: 
      print(v) 
     if endDate == 2: 
      start = False 
      break 

只打印(我知道print语句)...

2017-08-01 00:00:00 2017-09-01 00:00:00 
2017-07-01 00:00:00 2017-08-01 00:00:00 
2017-06-01 00:00:00 2017-07-01 00:00:00 
2017-05-01 00:00:00 2017-06-01 00:00:00 
2017-04-01 00:00:00 2017-05-01 00:00:00 
2017-03-01 00:00:00 2017-04-01 00:00:00 
2017-02-01 00:00:00 2017-03-01 00:00:00 
2017-01-01 00:00:00 2017-02-01 00:00:00 

而不是实际的鸣叫自己。我期待着类似...

2017-08-01 00:00:00 2017-09-01 00:00:00 
['heres a tweet'] 
['theres a tweet'] 
2017-07-01 00:00:00 2017-08-01 00:00:00 
['there only 1 tweet for this month'].... 

我有点卡住了,我怎么能做到这一点?

回答

1

你可以只group by月份,而不是试图减/比较不同的月份:

>>> d = {datetime.datetime(2017, 9, 30, 19, 55, 20): ['this is some tweet text'], 
     datetime.datetime(2017, 9, 30, 20, 55, 20): ['this is another tweet'], 
     datetime.datetime(2017, 10, 30, 19, 55, 20): ['this is an october tweet'],} 
>>> from itertools import groupby 
>>> for month, group in groupby(d.items(), lambda (k, v): k.month): 
...  print(month) 
...  for dt, tweet in group: 
...   print(dt, tweet) 
...   
10 
2017-10-30 19:55:20 ['this is an october tweet'] 
9 
2017-09-30 19:55:20 ['this is some tweet text'] 
2017-09-30 20:55:20 ['this is another tweet'] 
>>> 

当然,你可以在一个更好的格式打印等(内连接的需要,因为每个键似乎是一个列表):

>>> for month, group in groupby(d.items(), lambda (k, v): k.month): 
...  tweets = list(group) 
...  print("%d tweet(s) in month %d" % (len(tweets), month)) 
...  print('\n'.join(','.join(tweet) for (dt, tweet) in tweets)) 
...  
1 tweet(s) in month 10 
this is an october tweet 
2 tweet(s) in month 9 
this is some tweet text 
this is another tweet 
>>> 
+0

我在这个例子中看到了groupby会更容易,但是我仍然在for循环的第一行中,在'(k,v)'的下面得到'SyntaxError'。我正在使用python 3.这会有所作为,因为你的代码看起来像python 2吗? – e1v1s

+0

啊,是的,道歉,@ e1v1s将所有'print x'改成'print(x)'(我没有在这台机器上安装python 3)。 – Bahrom

+0

是的,我已经在打印语句中添加了括号。在上面的评论中提到了'Syntax Error' :) – e1v1s

0

第一件事:你把两个项目在你的字典中完全相同的关键。第二个将覆盖第一个。对于其余部分,我将假设示例中的第二项略有不同(seconds=21)。

您的代码无法正常工作的原因是因为您在for循环内将endDatestartDate递减。因此,您只能在字典中检查每个日期对应的单个项目;如果该项目恰好在该月登陆,则会被打印。如果没有,它不会。为了说明,这里是如果你改变你得到你的printprint(datetimeStart, datetimeEnd, k, v)

2017-09-01 00:00:00 2017-10-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text'] 
['this is some tweet text'] 
2017-08-01 00:00:00 2017-09-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet'] 
2017-07-01 00:00:00 2017-08-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text'] 
2017-06-01 00:00:00 2017-07-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet'] 
2017-05-01 00:00:00 2017-06-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text'] 
2017-04-01 00:00:00 2017-05-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet'] 
2017-03-01 00:00:00 2017-04-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text'] 
2017-02-01 00:00:00 2017-03-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet'] 
2017-01-01 00:00:00 2017-02-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text'] 

用最少的改变现有代码的解决将是只需将递减的for环的前部和迪登的if endDate...块到while循环的水平:

while start: 
    endDate-=1 
    startDate-=1 
    for k,v in tweetDict.items(): 
     datetimeStart = datetime(2017, startDate, 1) 
     datetimeEnd = datetime(2017,endDate, 1) 
     print(datetimeStart, datetimeEnd, k, v) 
     if datetimeStart < k < datetimeEnd: 
      print(v) 
    if endDate == 2: 
     start = False 
     break 

当然,在这一点上,你可能也只是摆脱if endDate...块,做while endDate > 2: