2017-08-08 90 views
1

我在下面显示了hhmm时间的df1。这些值代表文字时间,但格式不正确。例如。 845应该是08:45,1125 = 11:25。熊猫数据框:将整数转换为hh:mm

CU       Parameters  31-07-2017 01-08-2017 02-08-2017 03-08-2017 
CU0111-039820-L  Time of Full Charge 1125   0   1359   1112 
CU0111-041796-H  Time of Full Charge 1233   0   0    1135 
CU0111-046907-0  Time of Full Charge 845   0   1229   1028 
CU0111-046933-6  Time of Full Charge 1053   0   0    1120 
CU0111-050103-K  Time of Full Charge 932   0   1314   1108  
CU0111-052525-J  Time of Full Charge 1214   1424  1307   1254 
CU0111-052534-M  Time of Full Charge 944   0   0   1128 
CU0111-052727-7  Time of Full Charge 1136   0   1443   1114 

我需要所有这些值转换为HH的有效的时间戳:MM,然后制定出这些时间戳的平均值,这还不包括“0”的数值。

CU       Parameters  31-07-2017 01-08-2017 02-08-2017 03-08-2017 
CU0111-039820-L  Time of Full Charge 11:25   0   13:59  11:12 
CU0111-041796-H  Time of Full Charge 12:33   0   0   11:35 
CU0111-046907-0  Time of Full Charge 08:45   0   12:29  10:28 
CU0111-046933-6  Time of Full Charge 10:53   0   0   11:20 
CU0111-050103-K  Time of Full Charge 09:32   0   13:14   11:08  
CU0111-052525-J  Time of Full Charge 12:14   14:24  13:07   12:54 
CU0111-052534-M  Time of Full Charge 09:44   0   0   11:28 
CU0111-052727-7  Time of Full Charge 11:36   0   14:43   11:14 

最终结果:

Average time of charge: hh:hh (excluding 0 values) 

Number of no charges: =count(number of 0) 

我试图沿着这些线路的东西,都无济于事:

text = df1[col_list].astype(str) 
df1[col_list] = text.str[:-2] + ':' + text.str[-2:] 
hhmm = df1[col_list] 
minutes = (hhmm/100).astype(int) * 60 + hhmm % 100 
df[col_list] = pd.to_timedelta(minutes, 'm') 
+0

你需要的数量和平均的或新的最后一排新的列? – jezrael

回答

2

我想你可以转换to_timedelta首先值:

cols = df.columns.difference(['CU','Parameters']) 

df[cols] = df[cols].replace(0, '0000') 
        .astype(str) 
        .apply(lambda x: pd.to_timedelta(x.str[:-2] + ':' + x.str[-2:] + ':00')) 
print (df) 
       CU   Parameters 31-07-2017 01-08-2017 02-08-2017 \ 
0 CU0111-039820-L Time of Full Charge 11:25:00 00:00:00 13:59:00 
1 CU0111-041796-H Time of Full Charge 12:33:00 00:00:00 00:00:00 
2 CU0111-046907-0 Time of Full Charge 08:45:00 00:00:00 12:29:00 
3 CU0111-046933-6 Time of Full Charge 10:53:00 00:00:00 00:00:00 
4 CU0111-050103-K Time of Full Charge 09:32:00 00:00:00 13:14:00 
5 CU0111-052525-J Time of Full Charge 12:14:00 14:24:00 13:07:00 
6 CU0111-052534-M Time of Full Charge 09:44:00 00:00:00 00:00:00 
7 CU0111-052727-7 Time of Full Charge 11:36:00 00:00:00 14:43:00 

    03-08-2017 
0 11:12:00 
1 11:35:00 
2 10:28:00 
3 11:20:00 
4 11:08:00 
5 12:54:00 
6 11:28:00 
7 11:14:00 

And t母鸡的平均不为空timedeltas创建新列和计算0作为True值的总和:

df['avg'] = df[cols][df[cols].ne(0)].mean(axis=1) 
df['number no changes'] = df[cols].eq(0).sum(axis=1) 
print (df) 
       CU   Parameters 31-07-2017 01-08-2017 02-08-2017 \ 
0 CU0111-039820-L Time of Full Charge 11:25:00 00:00:00 13:59:00 
1 CU0111-041796-H Time of Full Charge 12:33:00 00:00:00 00:00:00 
2 CU0111-046907-0 Time of Full Charge 08:45:00 00:00:00 12:29:00 
3 CU0111-046933-6 Time of Full Charge 10:53:00 00:00:00 00:00:00 
4 CU0111-050103-K Time of Full Charge 09:32:00 00:00:00 13:14:00 
5 CU0111-052525-J Time of Full Charge 12:14:00 14:24:00 13:07:00 
6 CU0111-052534-M Time of Full Charge 09:44:00 00:00:00 00:00:00 
7 CU0111-052727-7 Time of Full Charge 11:36:00 00:00:00 14:43:00 

    03-08-2017  avg number no changes 
0 11:12:00 12:12:00     1 
1 11:35:00 12:04:00     2 
2 10:28:00 10:34:00     1 
3 11:20:00 11:06:30     2 
4 11:08:00 11:18:00     1 
5 12:54:00 13:09:45     0 
6 11:28:00 10:36:00     2 
7 11:14:00 12:31:00     1 

print (df[cols][df[cols].ne(0)]) 
    01-08-2017 02-08-2017 03-08-2017 31-07-2017 
0  NaT 13:59:00 11:12:00 11:25:00 
1  NaT  NaT 11:35:00 12:33:00 
2  NaT 12:29:00 10:28:00 08:45:00 
3  NaT  NaT 11:20:00 10:53:00 
4  NaT 13:14:00 11:08:00 09:32:00 
5 14:24:00 13:07:00 12:54:00 12:14:00 
6  NaT  NaT 11:28:00 09:44:00 
7  NaT 14:43:00 11:14:00 11:36:00 

print (df[cols].eq(0)) 
    01-08-2017 02-08-2017 03-08-2017 31-07-2017 
0  True  False  False  False 
1  True  True  False  False 
2  True  False  False  False 
3  True  True  False  False 
4  True  False  False  False 
5  False  False  False  False 
6  True  True  False  False 
7  True  False  False  False 
+0

这很好,谢谢。当我使用2个额外的列保存新的df时,单元格中的值如下所示:'0 days 10:59:00.000000000'。有没有办法让这只是'10:59:00' – wazzahenry

+0

我认为你至少有更高的价值作为'0天',所以所有的timedeltas被改变 - 检查[this](https://stackoverflow.com/ a/45484262/2901002) - 增加'213000'秒后'0 Days'也被添加。在最后一段是解决所有日子。我希望它能帮助你。 – jezrael