2017-06-01 80 views
2

我有一个如下所示的数据框。Python:dataframe的索引是三列的组合,如何区分它们?

该索引是三个变量的组合:“日期”,“时间”和“状态”。

我有两个问题:

1)我可以转换成该指数为三大列?

2)我可以结合“日期”和“时间”作为新的时间序列索引吗?

Dataframe example

in[19]: mydf 
Out[19]: 
            Entry  Exit 
Date  Time  Status        
01-01-13 03:22:00 REGULAR  57308825.0 43088232.0 
     07:22:00 REGULAR  57309553.0 43088773.0 
     11:22:00 REGULAR  57310969.0 43089369.0 
     15:22:00 REGULAR  57313168.0 43090433.0 
     19:22:00 REGULAR  57315755.0 43092329.0 
     23:22:00 REGULAR  57317073.0 43093590.0 
01-02-13 03:22:00 REGULAR  57317475.0 43094076.0 
     07:22:00 REGULAR  57319975.0 43094558.0 
     07:37:45 DOOR OPEN 1524176.0 672377.0 
     07:37:46 DOOR CLOSE 1524176.0 672377.0 
     11:22:00 REGULAR  57326441.0 43097377.0 
     15:22:00 REGULAR  57331818.0 43100057.0 
     19:22:00 REGULAR  57337786.0 43103918.0 
     23:22:00 REGULAR  57339290.0 43105700.0 
01-03-13 03:22:00 REGULAR  57339676.0 43106258.0 
     07:22:00 REGULAR  57342339.0 43106791.0 
     09:37:39 REGULAR  23317856.0 15286811.0 
     11:22:00 REGULAR  57349205.0 43109943.0 
     15:22:00 REGULAR  57354752.0 43112434.0 
     15:22:01 DOOR OPEN  4558.0   0.0 
     15:27:12 LOGON   4558.0   0.0 
     15:28:04 LGF-MAN   4558.0   0.0 
     15:28:07 DOOR CLOSE  4558.0   0.0 
     15:28:08 DOOR OPEN  4558.0   0.0 
     15:28:21 DOOR CLOSE  4558.0   0.0 
     18:25:09 DOOR OPEN 7171535.0 2420826.0 
     18:27:14 DOOR CLOSE 7171535.0 2420826.0 
     18:27:42 DOOR OPEN 7171535.0 2420826.0 
     18:27:44 DOOR CLOSE 7171535.0 2420826.0 
     18:27:45 DOOR OPEN 7171535.0 2420826.0 

回答

2

来源DF:

In [46]: x 
Out[46]: 
            a   b 
(01-01-13, 03:22:00, aaa) 0.773264 0.074545 
(01-02-13, 03:22:00, bbb) 0.070720 0.806460 
(01-03-13, 13:33:00, ccc) 0.645056 0.374402 

Can I convert this index into three general columns?

In [50]: x[['i1','i2','i3']] = pd.DataFrame(x.index.values.tolist()).values 

In [51]: x 
Out[51]: 
            a   b  i1  i2 i3 
(01-01-13, 03:22:00, aaa) 0.773264 0.074545 01-01-13 03:22:00 aaa 
(01-02-13, 03:22:00, bbb) 0.070720 0.806460 01-02-13 03:22:00 bbb 
(01-03-13, 13:33:00, ccc) 0.645056 0.374402 01-03-13 13:33:00 ccc 

Can I combine "Date" and "Time" together as a new time series index?

In [53]: x = x.set_index(pd.to_datetime(x.i1 + ' ' + x.i2, errors='coerce')) 

In [54]: x 
Out[54]: 
          a   b  i1  i2 i3 
2013-01-01 03:22:00 0.773264 0.074545 01-01-13 03:22:00 aaa 
2013-01-02 03:22:00 0.070720 0.806460 01-02-13 03:22:00 bbb 
2013-01-03 13:33:00 0.645056 0.374402 01-03-13 13:33:00 ccc 

你可以从索引v也可以直接做alues:

In [56]: x 
Out[56]: 
            a   b 
(01-01-13, 03:22:00, aaa) 0.735934 0.488669 
(01-02-13, 03:22:00, bbb) 0.214214 0.234158 
(01-03-13, 13:33:00, ccc) 0.071282 0.747958 

In [57]: x = x.set_index(pd.to_datetime(x.index.str[0] + ' ' + x.index.str[1], errors='coerce')) 

In [58]: x 
Out[58]: 
          a   b 
2013-01-01 03:22:00 0.735934 0.488669 
2013-01-02 03:22:00 0.214214 0.234158 
2013-01-03 13:33:00 0.071282 0.747958 
+0

第一行代码'[['i1','i2','i3']] = pd.DataFrame(x.index.values.tolist())values'它给了我错误:'' ['i1''i2''i3']不在索引中“'你知道如何解决它吗?我的原始索引列标签是'“Index”'。非常感谢! – QianLi

1

我们可以这样开始:

>>> my_df = pd.DataFrame({'Entry': [5.73], 'Exit': [4.31]}, 
         index=[('01-01-13', '03:22:00', 'REGULAR')]) 
>>> my_df 
           Entry Exit 
(01-01-13, 03:22:00, REGULAR) 5.73 4.31 

1)转换成3列:

>>> combined = pd.concat([my_df.reset_index()['index'].apply(pd.Series), 
          my_df.reset_index(drop=True)], 
          axis=1) 
      0   1  2 Entry Exit 
0 01-01-13 03:22:00 REGULAR 5.73 4.31 

现在你将有3个新列名为0,1,2

2)合并日期和时间以得到DatetimeIndex:

>>> combined.set_index(pd.to_datetime(combined[0] + ' ' + combined[1])) 
          0   1  2 Entry Exit 
2013-01-01 03:22:00 01-01-13 03:22:00 REGULAR 5.73 4.31 

现在,您可以删除第0列和第1列,并根据需要重命名第2列。

+0

它工作。非常感谢!!! – QianLi

相关问题