2017-09-26 154 views
1

我有数据帧 它看起来像这样转换时间戳STR值蟒蛇大熊猫据帧

Date  Player   Fee 
0 2017-01-08 Steven Berghuis 6500000 
1 2017-07-18 Jerry St. Juste 4500000 
2 2017-07-18 Ridgeciano Haps 600000 
3 2017-01-07 Sofyan Amrabat 400000 

我要改变每一个日期值str的,如果他们符合条件

def is_in_range(x): 
ses1 = pd.to_datetime('2013-02-01') 
ses2 = pd.to_datetime('2014-02-01') 
ses3 = pd.to_datetime('2015-02-01') 
ses4 = pd.to_datetime('2016-02-01') 
ses5 = pd.to_datetime('2017-02-01') 
ses6 = pd.to_datetime('2018-02-01') 

if x < ses1 : 
    x = '2012-13' 
if x > ses2 and x < ses3 : 
    x = '2013-14' 
if x > ses3 and x < ses4 : 
    x = '2014-15' 
if x > ses4 and x < ses5 : 
    x = '2015-16' 
if x > ses5 and x < ses6 : 
    x = '2016-17' 
return ses6 
aj = ajax_t['Date'].apply(is_in_range) 
aj 

TypeError Traceback (most recent call last) in() 18 x = '2016-17' 19 return ses6 ---> 20 aj = ajax_t['Date'].apply(is_in_range) 21 aj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 2353
else: 2354 values = self.asobject -> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype) 2356 2357 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66645)()

in is_in_range(x) 15 if x > ses4 and x < ses5 : 16 x = '2015-16' ---> 17 if x > ses5 and x < ses6 : 18 x = '2016-17' 19 return ses6

pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp.richcmp (pandas/_libs/tslib.c:20281)()

TypeError: Cannot compare type 'Timestamp' with type 'str'

我得到这个错误的任何建议, 请问

+0

我想你'date'列可能不日期时间格式?我不知道,这取决于你的数据帧。如果是这样,你将不得不将这个'x'转换为datetime。 – JAW

回答

0

你可以尝试更改日期的格式:

ses1 = pd.to_datetime('2017-01-08', format='%Y%b/%d') 
+0

希望这会有所帮助。 –

1

你需要在必要时转换为to_datetime列,改变可变x到另一个,像y,因为在循环覆盖。

也是可变的y应该从函数返回:

ajax_t['Date'] = pd.to_datetime(ajax_t['Date']) 

def is_in_range(x): 
    print (x) 
    ses1 = pd.to_datetime('2013-02-01') 
    ses2 = pd.to_datetime('2014-02-01') 
    ses3 = pd.to_datetime('2015-02-01') 
    ses4 = pd.to_datetime('2016-02-01') 
    ses5 = pd.to_datetime('2017-02-01') 
    ses6 = pd.to_datetime('2018-02-01') 

    if x < ses1 : 
     y = '2012-13' 
    if x > ses2 and x < ses3 : 
     y = '2013-14' 
    if x > ses3 and x < ses4 : 
     y = '2014-15' 
    if x > ses4 and x < ses5 : 
     y = '2015-16' 
    if x > ses5 and x < ses6 : 
     y = '2016-17' 
    return y 
aj = ajax_t['Date'].apply(is_in_range) 
print (aj) 
0 2015-16 
1 2016-17 
2 2016-17 
3 2015-16 
Name: Date, dtype: object 
0

很显然,你没有在你的DataFrame ajax_t加载DateDateTime。尝试将其转换

ajax_t['Date'] = pd.to_datetime(ajax_t.Date) 

或者,如果你从文件加载数据帧ajax_t,例如,data.csv文件,你可以指定参数来强制解析Date列是DateTime类型。

ajax_t = pd.read_csv('data.csv', parse_dates=['Date']) 

希望这会有所帮助。

1

通过使用pd.cut

ses1 = pd.to_datetime('2013-02-01') 
ses2 = pd.to_datetime('2014-02-01') 
ses3 = pd.to_datetime('2015-02-01') 
ses4 = pd.to_datetime('2016-02-01') 
ses5 = pd.to_datetime('2017-02-01') 
ses6 = pd.to_datetime('2018-02-01') 

pd.cut(df.Date,[ses1,ses2,ses3,ses4,ses5,ses6],labels=['2012-13','2013-14','2014-15','2015-16','2016-17']) 


Out[1227]: 
0 2015-16 
1 2016-17 
2 2016-17 
3 2015-16 
Name: Date, dtype: category 

或者

ses = pd.to_datetime(['2013-02-01','2014-02-01','2015-02-01','2016-02-01','2017-02-01','2018-02-01']) 
pd.cut(df.Date,ses,labels=['2012-13','2013-14','2014-15','2015-16','2016-17']) 
+0

这真的很不错。有一个编辑。你可以用你自己的话来添加它。 – Dark

+0

@Bharathshetty会保留它〜:) – Wen