2016-11-04 65 views
1

我有一个熊猫数据框与两个'日期时间'列t1,t2。现在我需要在数据帧过滤掉所有行,其中T1 < = T2 T2可以 熊猫过滤日期时间列不包括

之前熊猫0.19.0 我能做到这一点楠

:熊猫0.19.0这个代码后

import pandas as pd 
from datetime import datetime 
dt = datetime.utcnow() 
dt64 = np.datetime64(dt) 
df = pd.DataFrame([(dt64,None)], columns=['t1','t2']) 
df[(df.t1<=df.t2)] 

失败

Traceback (most recent call last): 
    File "workspace/python/MyTests/test1.py", line 87, in <module> 
    testDfTimeCompare() 
    File "workspace/python/MyTests/test1.py", line 80, in testDfTimeCompare 
    df[(df.t1<=df.t2)] 
    File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 813, in wrapper 
    return self._constructor(na_op(self.values, other.values), 
    File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 787, in na_op 
    y = y.view('i8') 
    File "anaconda/lib/python2.7/site-packages/numpy/core/_internal.py", line 367, in _view_is_safe 
    raise TypeError("Cannot change data-type for object array.") 
TypeError: Cannot change data-type for object array. 

实现此目的的最佳方法是什么?

回答

2

我认为你需要投None将列t2to_datetimeNaT,则可以使用更快的功能Series.le什么是一样的<=

df.t2 = pd.to_datetime(df.t2) 
print (df) 
          t1 t2 
0 2016-11-04 07:24:53.372838 NaT 

mask = df.t1.le(df.t2) 
print (mask) 
0 False 
dtype: bool 

mask = df.t1 <= df.t2 
print (mask) 
0 False 
dtype: bool 
0

做一些面膜是这样的:

mask = ((df <= 0).cumsum() > 0).any() 
>>> mask 
t1 False 
t2  True 
dtype: bool