2017-09-22 40 views
1

我有一个数据帧,我想这两者之间添加包含时间差列的另一列:添加timedelta值的新列在熊猫

df[Diff] = df['End Time'] - df['Open Time'] 
df[Diff] 
0  0 days 01:25:40 
1  0 days 00:41:57 
2  0 days 00:21:47 
3  0 days 16:41:57 
4  0 days 04:32:00 
5  0 days 03:01:57 
6  0 days 01:37:56 
7  0 days 01:13:57 
8  0 days 01:07:56 
9  0 days 02:33:59 
10 29 days 18:33:53 
11 0 days 03:50:56 
12 0 days 01:57:56 

我想有此列格式 '1H25米',所以我试图计算时间天:

diff = df['End Time'] - df['Open Time'] 
hours = diff.dt.days * 24 + diff.dt.components.hours 
minutes = diff.dt.components.minutes 

并得到这些结果:

0  1 
1  0 
2  0 
3  16 
4  4 
5  3 
6  1 
7  1 
8  1 
9  2 
10 714 
11  3 
12  1 
dtype: int64h 0  25 
1  41 
2  21 
3  41 
4  32 
5  1 
6  37 
7  13 
8  7 
9  33 
10 33 
11 50 
12 57 
Name: minutes, dtype: int64m 

但我不能表达这些结果以这种格式在新列:

'{}h {}m'.format(hours,minutes)) 
+1

尝试'[“{0}ħ{1} m'.format(* X),用于在拉链X(小时,分钟) ]'? – Zero

+0

@零我想在数据框的帮助下发布。 Im挣扎 – Dark

+1

或者'hours.astype(str)+'h'+ minutes.astype(str)+'m''? – Zero

回答

1

你可以提取相关栏目,并转换为使用astypestr,只是CONCAT的COLS需要。

c = (df['End Time'] - df['Open Time'])\ 
       .dt.components[['days', 'hours', 'minutes']] 
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm' 
df['diff'] 
0  1h 25m 
1  0h 41m 
2  0h 21m 
3  16h 41m 
4  4h 32m 
5  3h 1m 
6  1h 37m 
7  1h 13m 
8  1h 7m 
9  2h 33m 
10 714h 33m 
11  3h 50m 
12  1h 57m 
Name: diff, dtype: object 
+0

@Bharathshetty不用担心Bharath :-) –

+0

@COLDSPEED感谢这种方法,我会尽力实现这一点。也许我不清楚,但我的目的是不浪费几天的时间。所以在这种情况下,我希望在几个小时内完成所有的区别。例如,对于索引10,结果应该是'714h 33m'no'18h 33m'。 – bar1

+0

@ bar1我已经通过将天列与24相乘来修复它。 –

1

可以使用total_seconds的转换timedelta到秒,再算上hoursminutes也秒钟,什么是快了10倍,使用dt.components

s = diff.dt.total_seconds().astype(int) 

hours = s // 3600 
# remaining seconds 
s = s - (hours * 3600) 
# minutes 
minutes = s // 60 
# remaining seconds 
seconds = s - (minutes * 60) 

a = hours.astype(str) + 'h ' + minutes.astype(str) 
print (a) 
0  1h 25 
1  0h 41 
2  0h 21 
3  16h 41 
4  4h 32 
5  3h 1 
6  1h 37 
7  1h 13 
8  1h 7 
9  2h 33 
10 714h 33 
11  3h 50 
12  1h 57 
Name: Diff, dtype: object 

Zero comment解决方案:

hours = diff.dt.days * 24 + diff.dt.components.hours 
minutes = diff.dt.components.minutes 

a = hours.astype(str) + 'h ' + minutes.astype(str) 
print (a) 
0  1h 25m 
1  0h 41m 
2  0h 21m 
3  16h 41m 
4  4h 32m 
5  3h 1m 
6  1h 37m 
7  1h 13m 
8  1h 7m 
9  2h 33m 
10 18h 33m 
11  3h 50m 
12  1h 57m 
dtype: object 

另:

a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)]) 
print (a) 
0  1h 25m 
1  0h 41m 
2  0h 21m 
3  16h 41m 
4  4h 32m 
5  3h 1m 
6  1h 37m 
7  1h 13m 
8  1h 7m 
9  2h 33m 
10 714h 33m 
11  3h 50m 
12  1h 57m 
dtype: object 

时序

#13000 rows 
df = pd.concat([df]*1000).reset_index(drop=True) 

In [191]: %%timeit 
    ...: hours = diff.dt.days * 24 + diff.dt.components.hours 
    ...: minutes = diff.dt.components.minutes 
    ...: 
    ...: a = hours.astype(str) + 'h ' + minutes.astype(str) 
    ...: 
1 loop, best of 3: 483 ms per loop 

In [192]: %%timeit 
    ...: s = diff.dt.total_seconds().astype(int) 
    ...: 
    ...: hours = s // 3600 
    ...: # remaining seconds 
    ...: s = s - (hours * 3600) 
    ...: # minutes 
    ...: minutes = s // 60 
    ...: # remaining seconds 
    ...: seconds = s - (minutes * 60) 
    ...: 
    ...: a = hours.astype(str) + 'h ' + minutes.astype(str) 
    ...: 
10 loops, best of 3: 43.9 ms per loop 

In [193]: %%timeit 
    ...: hours = diff.dt.days * 24 + diff.dt.components.hours 
    ...: minutes = diff.dt.components.minutes 
    ...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)]) 
    ...: 
1 loop, best of 3: 465 ms per loop 

#cᴏʟᴅsᴘᴇᴇᴅ solution 
In [194]: %%timeit 
    ...: c = diff.dt.components[['days', 'hours', 'minutes']] 
    ...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm' 
    ...: 
1 loop, best of 3: 208 ms per loop 
+0

感谢您的努力。它也起作用 – bar1

+0

是的,我添加了避免'dt.components'的时间,因为速度慢。 – jezrael