2017-10-05 83 views
0

的元件我有两个CSV文件,文件1和文件2,其包含不同的信息。这两个csv文件的第二列都包含一个日期。我想确定文件2中的任何日期是否包含在来自文件1的日期 - 时间对之间。由此,我的意思是包含在来自文件1的两个连续日期之间。我还有一个额外的约束,我需要第4列中的字段文件1的值为非零。如何检查如果日期是双日期对蟒

import numpy as np 
import csv 
from datetime import datetime,date,timedelta 


def try_parsing_date(text): 

    for fmt in ('%Y-%m-%d %H:%M','%Y-%m-%d %H:%M:%S', '%d/%m/%Y %H:%M:%s', '%d/%m/%Y %H:%M','%d/%m/%Y','%H:%M:%S','%Y-%m-%d-%H%M%S.%f'): 
     try: 
      date_strip = datetime.strptime(text, fmt) 
      return date_strip 
     except ValueError: 
      pass 
    raise ValueError('no valid date format found') 

def append_dates(a,b): 
    date_1_vec = [] 
    date_2_vec = [] 
    with open(a) as file1: 
     reader1 = csv.reader(file1,delimiter = ',') 
     for row in reader: 
      date_1_vec.append(datetime.strptime(row[1], "%Y-%m-%d-%H%M%S")) 
     with open(b) as file2: 
     feed_bin = [] 
     upd_vec = [] 
     nothing = [0] 
     reader2 = csv.reader(file2,delimiter = ',')    

     for row in reader2: 
      temp_date = datetime.strptime(row[1], "%Y-%m-%d %H:%M:%S") 
      temp_date2 = temp_date + timedelta(minutes=15) 
      test_val = float(row[4]) 
      if any( (temp_date < dat for dat in date_1_vec) and (temp_date2 > dat for dat in date_1_vec) and (test_val >nothing for nothing in nothing) ): 
       feed_bin.append(1) 
       val = 1 
       #print("yes") 
      else: 
       feed_bin.append(0) 
       val = 0 
       #print("No") 
      upd = [row[0],row[1],row[2],val] 
      upd_vec.append(upd) 
    np.savetxt("outfile.csv",upd_vec, delimiter=",", fmt='%s') 

def main(): 
    append_dates("file1.csv","file2.csv") 
main() 

我已经尝试了一些差异

文件1

42 08/06/2017 00:00 1 15 0 
42 08/06/2017 00:15 5 11 75 
42 08/06/2017 00:30 0 15 0 
42 08/06/2017 00:45 85 475 0 
42 08/06/2017 01:00 125 75 0 
42 08/06/2017 01:15 0 0 0 
42 08/06/2017 01:30 95 475 0 
42 08/06/2017 01:45 0 75 2.625 
42 08/06/2017 02:00 0 15 0 
42 08/06/2017 02:15 0 13.5 1.5 
42 08/06/2017 02:30 0 1.29623 3.15814 
42 08/06/2017 02:45 0 7.5 15 
42 08/06/2017 03:00 0 0 15 

文件2

42 2017-06-07-232240 
42 2017-06-08-012636 
42 2017-06-08-013811 
42 2017-06-08-014553 
42 2017-06-08-014751 
42 2017-06-08-101332 
42 2017-06-08-101558 
42 2017-06-08-102707 
42 2017-06-08-104039 
42 2017-06-08-105516 
42 2017-06-08-110620 

最新尝试但是迄今为止还没有成功。我目前的方法存在的问题是(我认为)条件始终得到满足,因为它正在搜索文件1中的所有日期,而不是按照我的要求连续日期。

如何修改我的代码,或一种全新的方法任何建议,将不胜感激!

后更新Jurgy的建议 - 电流输出:

2017-06-14 13:51:57 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:57:34 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:51:57 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:57:34 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:51:57 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:57:34 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:51:57 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 13:57:34 is between 2017-06-14 13:45:00 and 2017-06-14 14:00:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 
2017-06-14 16:42:03 is between 2017-06-14 16:30:00 and 2017-06-14 16:45:00 

回答

1

怎么样通过文件1的行,每一行迭代,迭代槽文件2中的行看看这些日子之一是在文件1的最后两行之间。这可以通过首先提取文件2的所有日期来优化,所以你不必每次打开te文件。如果从第一个文件的日期并不总是consectutive顺序,你也可以先检查是否prev_day < cur_day没有你有帮助的解决方案打开文件2.

def append_dates(a,b): 
    cur_day, prev_day = None, None 
    with open(a) as file1: 
     for f1row in csv.reader(file1,delimiter = ','): 
      cur_day = datetime.strptime(f1row[1], "%Y-%m-%d-%H%M%S")) 
      if prev_day == None: 
       prev_day = cur_day 
       continue 
      with open(b) as file2: 
       for f2row in csv.reader(file2,delimiter = ','): 
        f2day = datetime.strptime(f2row[1], "%Y-%m-%d %H:%M:%S") 
        if prev_day <= f2day <= cur_day: 
         print("{} is between {} and {}".format(f2day, prev_day, cur_day)) 
      prev_day = cur_day 
+0

谢谢!它几乎是做我想要的,但目前正在按照正确的条件打印多个时间。我试图弄清楚为什么这是 – Sjoseph

+0

我也加了$打开(b)作为file2:$后继续声明 – Sjoseph

+0

哦,是的,忘记了开放(二)。如果在f1中的两个日期之间存在多行f2,此解决方案将每f1行打印多次。如果你想选择第一个,你可以在打印后添加一个中断。 – Jurgy