2015-02-23 135 views
1

我想创建一个新列,它会显示在两个日期之间的天timedelta如下面的熊猫数据框:大熊猫天

>>> hg[['not inc','date']] 
    not inc    date 
0 False 2012-02-29 00:00:00 
1 False 2012-03-16 00:00:00 
2 False 2012-04-04 00:00:00 
3  True 2012-05-08 00:00:00 
4 False 2012-05-12 00:00:00 
5 False 2012-05-26 00:00:00 
6 False 2012-06-09 00:00:00 
7 False 2012-10-13 00:00:00 
8 False 2012-11-10 00:00:00 
9  True 2013-03-19 00:00:00 
10 False 2013-04-01 00:00:00 
11 False 2013-04-25 00:00:00 
12 False 2013-05-04 00:00:00 
13 False 2013-05-18 00:00:00 
14 False 2013-06-01 00:00:00 
15 True 2013-08-20 00:00:00 
16 False 2013-08-31 00:00:00 
17 False 2013-09-21 00:00:00 
18 False 2013-10-05 00:00:00 
19 False 2013-10-19 00:00:00 
20 False 2013-11-09 00:00:00 
21 True 2014-01-21 00:00:00 
22 False 2014-02-08 00:00:00 
23 False 2014-02-22 00:00:00 
24 False 2014-03-08 00:00:00 
25 False 2014-03-29 00:00:00 
26 False 2014-04-19 00:00:00 
27 True 2014-07-21 00:00:00 
28 True 2014-08-01 00:00:00 
29 False 2014-08-09 00:00:00 
30 False 2014-08-30 00:00:00 
31 False 2014-09-13 00:00:00 
32 True 2014-09-26 00:00:00 
33 False 2014-10-04 00:00:00 
34 True 2015-01-08 00:00:00 
35 True 2015-01-20 00:00:00 
36 False 2015-01-31 00:00:00 
37 False 2015-02-14 00:00:00 

我想要的日期差的开始减去2012-01-02并且是一个整数。

这是我尝试过的,但没有成功,因为prevdate不会更新到新行的日期,但始终指的是datetime(2012,01,02)的原始起始位置。我正在通过数据帧的行使用iterrows。

>>>for index, row in hg.iterrows(): 
    prevdate = datetime(2012,01,02) 
    dsince = row['date']-prevdate 
    prevdate = row['date'] 
    print dsince 

结果(此外,我不知道如何修改值转换成int):

58 days, 0:00:00 
74 days, 0:00:00 
93 days, 0:00:00 
127 days, 0:00:00 
131 days, 0:00:00 
145 days, 0:00:00 
159 days, 0:00:00 
285 days, 0:00:00 
313 days, 0:00:00 
442 days, 0:00:00 
455 days, 0:00:00 
479 days, 0:00:00 
488 days, 0:00:00 
502 days, 0:00:00 
516 days, 0:00:00 
596 days, 0:00:00 
607 days, 0:00:00 
628 days, 0:00:00 
642 days, 0:00:00 
656 days, 0:00:00 
677 days, 0:00:00 
750 days, 0:00:00 
768 days, 0:00:00 
782 days, 0:00:00 
796 days, 0:00:00 
817 days, 0:00:00 
838 days, 0:00:00 
931 days, 0:00:00 
942 days, 0:00:00 
950 days, 0:00:00 
971 days, 0:00:00 
985 days, 0:00:00 
998 days, 0:00:00 
1006 days, 0:00:00 
1102 days, 0:00:00 
1114 days, 0:00:00 
1125 days, 0:00:00 
1139 days, 0:00:00 

要更复杂一些,我想只有创建日期差异的另一列使事情在'不包含'列有False的行之间。

谢谢。

+0

您是否尝试过'dsince =(row ['date'] - prevdate).days'? – Uri 2015-02-23 14:20:48

+0

有点帮助我,谢谢 – user3374113 2015-02-23 14:44:58

回答

1

假设你的日期列已经投作为一个datetime64

In [61]: hg = pd.DataFrame({"not inc":[False , False, False, True, False],"date":pd.to_datetime(pd.Series(["2012-02-29", "2012-03-16", "2012-04-04", "2012-05-08", "2012-05-12"]))}) 

In [63]: hg.dtypes 
Out[63]: 
date  datetime64[ns] 
not inc    bool 
dtype: object 

暂时滤掉行你不想包括:

In [64]: included = hg[hg["not inc"] == False] 

使用shift获得了一系列的你想要减去的日期,在开始日期填入你的开始日期:

In [66]: prev_dates = included.date.shift().fillna(pd.datetime(2012,1,2)) 

In [67]: prev_dates 
Out[67]: 
0 2012-01-02 
1 2012-02-29 
2 2012-03-16 
4 2012-04-04 
Name: date, dtype: datetime64[ns] 

减去日期和重铸timedelta为int:

In [68]: delta = included.date - prev_dates 

In [69]: delta = delta.astype("timedelta64[D]") 

In [70]: delta 
Out[70]: 
0 58 
1 16 
2 19 
4 38 
Name: date, dtype: float64 

然后concat新系列,以原始的数据帧。

In [71]: delta.name = "delta" 

In [72]: hg = pd.concat((hg, delta), axis=1) 

In [73]: hg 
Out[73]: 
     date not inc delta 
0 2012-02-29 False  58 
1 2012-03-16 False  16 
2 2012-04-04 False  19 
3 2012-05-08 True NaN 
4 2012-05-12 False  38 
+0

感谢您的回答,它的作品是一种享受,我从你提供给我的东西中学到了很多东西。我唯一的查询就是'delta.astype(“timedelta64 [D]”)'出现了一个错误'TypeError:不能从[timedelta64 [ns]]到[timedelta64 [D]]''中添加一个timedelta。你认为有这样做的另一种方法 – user3374113 2015-02-23 15:12:05

+0

你可以尝试这里的一些想法:http://stackoverflow.com/questions/18215317/extracting-days-from-a-numpy-timedelta64-value – 2015-02-23 15:25:03

0

在循环之前放置线prevdate = datetime(2012,01,02)

prevdate = datetime(2012,01,02) 
for index, row in hg.iterrows(): 
    dsince = (row['date'] - prevdate).days 
    prevdate = row['date'] 
    print dsince 

如果它不工作,转换prevdaterow['date']为日期。