2016-02-05 58 views
1

我正在分析来自不同传感器的数据。传感器在使用时变为活动状态(1)。但是,我只需要第一次和最后一次激活的时间(和日期),但不需要从中间开始。找到时,我需要创建一个新的DataFrame,其中第一个和最后一个匹配项的时间和日期以及'User'和'Activity'。如何获取熊猫物品的首次和最后一次出现

我试着遍历每一行并构建一系列if-then语句,但没有运气。 我想知道是否有一个熊猫函数可以让我有效地做到这一点? 这是我的数据的一个子集。

我刚刚开始得到熊猫的叮咬,所以任何帮助将不胜感激。

干杯!

import pandas as pd    
cols=['User', 'Activity', 'Coaster1', 'Coaster2', 'Coaster3', 
      'Coaster4', 'Coaster5', 'Coffee', 'Door', 'Fridge', u'coldWater', 
      'hotWater', 'SensorDate', 'SensorTime', 'RegisteredTime'] 

data=[['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:54', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:54', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:55', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:55', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:56', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:56', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 1.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:58', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 1.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:59', '13:09:00']] 

    df=pd.DataFrame(data,columns=cols) 

所需的输出将如下所示:

data_out=[['Chris','coffee + hot water','0','0','0','0','0','0','1','0','0','0','2015-09-21','13:05:54','13:05:56','13:09:00'],['Chris','coffee + hot water','0','1','0','0','0','0','0','0','0','0','2015-09-21','13:05:58','13:05:59','13:09:00']] 

cols_out=['User', 
'Activity', 
'Coaster1', 
'Coaster2', 
'Coaster3', 
'Coaster4', 
'Coaster5', 
'Coffee', 
'Door', 
'Fridge', 
u'coldWater', 
'hotWater', 
'SensorDate', 
'SensorTimeFirst', 
'SensorTimeLast', 
'RegisteredTime'] 


df_out=pd.DataFrame(data_out, columns=cols_out) 
+0

样品的期望输出是什么? – jezrael

+0

也许你可以试试'print df [df ['Door'] == 1] .groupby(['User','Activity'])[['Door','SensorDate','SensorTime']]。min )'和 'print df [df ['Door'] == 1] .groupby(['User','Activity'])[['Door','SensorDate','SensorTime']] .max() ' – jezrael

+0

在OP中添加了所需的输出编辑。谢谢! – Waldo

回答

0

您可以尝试groupby和他们apply自定义函数f,如:

def f(x): 
    Doormin = x[x['Door'] == 1].min() 
    Doormax = x[x['Door'] == 1].max() 
    Coaster2min = x[x['Coaster2'] == 1].min() 
    Coaster2max = x[x['Coaster2'] == 1].max()  
    Coaster1min = x[x['Coaster1'] == 1].min() 
    Coaster1max = x[x['Coaster1'] == 1].max()  
    Door = pd.Series([Doormin['Door'], Doormin['SensorDate'], Doormin['SensorTime'], Doormax['SensorTime'], Doormin['RegisteredTime']], index=['Door','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime']) 
    Coaster1 = pd.Series([Coaster1min['Coaster1'], Coaster1min['SensorDate'], Coaster1min['SensorTime'], Coaster1max['SensorTime'], Coaster1min['RegisteredTime']], index=['Coaster1','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime']) 
    Coaster2 = pd.Series([Coaster2min['Coaster2'], Coaster2min['SensorDate'], Coaster2min['SensorTime'], Coaster2max['SensorTime'], Coaster2min['RegisteredTime']], index=['Coaster2','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime']) 

    return pd.DataFrame([Door, Coaster2, Coaster1]) 

print df.groupby(['User','Activity']).apply(f) 

          Coaster1 Coaster2 Door RegisteredTime \ 
User Activity               
Chris coffee + hot water 0  NaN  NaN  1  13:09:00 
         1  NaN   1 NaN  13:09:00 
         2  NaN  NaN NaN   NaN 

          SensorDate SensorTimeFirst SensorTimeLast 
User Activity               
Chris coffee + hot water 0 2015-09-21  13:05:54  13:05:56 
         1 2015-09-21  13:05:58  13:05:59 
         2   NaN    NaN   NaN 

也许你可以通过添加0,而不是NaNfillna

df = df.groupby(['User','Activity']).apply(f) 
df[['Coaster1','Coaster2','Door']] = df[['Coaster1','Coaster2','Door']].fillna(0) 
print df 
          Coaster1 Coaster2 Door RegisteredTime \ 
User Activity               
Chris coffee + hot water 0   0   0  1  13:09:00 
         1   0   1  0  13:09:00 
         2   0   0  0   NaN 

          SensorDate SensorTimeFirst SensorTimeLast 
User Activity               
Chris coffee + hot water 0 2015-09-21  13:05:54  13:05:56 
         1 2015-09-21  13:05:58  13:05:59 
         2   NaN    NaN   NaN 
+0

谢谢!这就像一个魅力:)我可以从这里继续。非常感谢!!!我非常感谢您花费的时间和精力:) – Waldo

+0

有两个问题:1)如何将DataFrame粘贴到您的答案中,让它保持格式?我找不到这样做的方法,以便使我的问题更加整洁。 2)在第一个例子中,为什么第2行是空的(NaN)并且在.fillna(0)之后保持原样?我不完全明白这一点(尽管我知道如何处理它,只是好奇而已)。 – Waldo

+0

什么意思是保留格式?形成代码?还是有问题的最终数据框?也许在每行之前尝试4个空格。 – jezrael

相关问题