2017-09-24 72 views
0

我对python完全陌生。我想从实际和预计到达日期和时间创建一个名为到达延迟的新列。我正在尝试使用Pandas Dataframe进行这种操作。我试过的代码如下。Python数据框 - 麻烦理解和解码错误

for i in range(0,df_new.shape[0]): 
    if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]: 
     if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"][i]: 
      df_new['Arrival Delay'][i] = df_new["ACT_ARRIVAL_TIME"][i] - 
      df_new["ARRIVAL_ETA_TIME"][i] 
     else: 
      df_new['Arrival Delay'][i] = 0 
    elif df_new["ACT_ARRIVAL_DATE"][i] > df_new["ARRIVAL_ETA_DATE"][i]: 
     if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"[i]: 
      df_new['Arrival Delay'][i] = 24 + (df_new["ACT_ARRIVAL_TIME"][i] - df_new["ARRIVAL_ETA_TIME"][i]) 
    else: 
     df_new['Arrival Delay'][i] = 24 

但我收到以下错误。

ValueError        Traceback (most recent call last) 
<ipython-input-60-8dfb865ac5c2> in <module>() 
    1 for i in range(0,df_new.shape[0]): 
----> 2  if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]: 
    3   if df_new[ACT_ARRIVAL_TIME[i]] > df_new[ARRIVAL_ETA_TIME[i]]: 
    4    df_new['Arrival Delay'] = df_new[ACT_ARRIVAL_TIME[i]] - df_new[ARRIVAL_ETA_TIME[i]] 
    5   else: 

C:\Users\3016205\AppData\Local\Continuum\Anaconda3\lib\site- 
packages\pandas\core\generic.py in __nonzero__(self) 
951   raise ValueError("The truth value of a {0} is ambiguous. " 
952       "Use a.empty, a.bool(), a.item(), a.any() or 
a.all()." 
--> 953       .format(self.__class__.__name__)) 
954 
955  __bool__ = __nonzero__ 

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all(). 

请帮我这一点。

注:变量的格式为datetime64 [NS]

+0

即使在R中,你也不需要做if'赋值迭代,而是使用矢量化的'ifelse()'。 – Parfait

回答

1

行这样

df_new["ACT_ARRIVAL_DATE"][i] 

需要这样写

df_new.loc[i,"ACT_ARRIVAL_DATE"] 

你不应该需要使用的循环,但是一个熊猫for循环看起来像这样

for index,row in df_new.iterrows(): 
    if row["ACT_ARRIVAL_DATE"] == row["ARRIVAL_ETA_DATE"]: 
     if row["ACT_ARRIVAL_TIME"] > row["ARRIVAL_ETA_TIME"]: 
      df_new.loc[index,'Arrival Delay'] = row["ACT_ARRIVAL_TIME"] - 
      row["ARRIVAL_ETA_TIME"] 
     else: 

避免for循环,你可以做一些布尔索引

df_new.loc[(df_new.ACT_ARRIVAL_DATE == df.ARRIVAL_ETA_DATE) & (df_new.ACT_ARRIVAL_TIME > df_new.ARRIVAL_ETA_TIME),'Arrival Delay'] = df_new.ACT_ARRIVAL_TIME - df_new.ARRIVAL_ETA_TIME 

,只是建立了这一点,为的可能性休息

0

考虑嵌套np.where()类似的r ifelse()

df_new["Arrival Delay"] = np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]), 
            df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 

            np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] <= df_new["ARRIVAL_ETA_TIME"]), 0, 

              np.where((df_new["ACT_ARRIVAL_DATE"] > df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]), 
                 24 + df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 24)))