2016-04-26 617 views
1

我用熊猫来分析我的数据,并执行:如何在pandas python中将值列转换为分位数?

df = pd.DataFrame(datas, columns=['userid', 'recency', 'frequency', 'monetary']) 


print df 
     userid recency frequency monetary 
0  47918  9   53 788778 
1  48302  85   10 232323 
2  8873  3   79 2323 
3  63158  23   23 2323232 
4  364  14   43 232323 
5  45191  1   75 224455 
6  21061  9   64 23367 
7  41356  22   55 2346777 
8  42455  14   30 23478 
9  65460  3   16 2345 

我需要recencyfrequencymonetary转换值到值范围1-5。所以输出是

 userid recency frequency monetary 
0  47918  1   2 3 
1  48302  2   1 2 
2  8873  3   4 5 
3  63158  2   2 2 
4  364  5   4 2 
5  45191  1   5 4 
6  21061  4   4 3 
7  41356  3   5 4 
8  42455  5   3 5 
9  65460  3   1 2 

怎么能在python中做到这一点?

THX

+1

位数的分数百分位,因此无法比较而不是1.0,所以你的问题没有意义,你可以显示所需的输出,或者你是否像'df [['recency','freque ncy']]。apply(lambda x:x.quantile(np.linspace(0,1,5)))'? – EdChum

+0

@EdChum是的,这是不一样的,我想...我已经更新我的问题 – ihsansat

回答

2

IIUC需要qcutcodes,最后需要补充1,因为最小值为1和最大5

df['recency1'] = pd.qcut(df['recency'].values, 5) 
df['frequency1'] = pd.qcut(df['frequency'].values, 5) 
df['monetary1'] = pd.qcut(df['monetary'].values, 5) 
print df 
    userid recency frequency monetary recency1 frequency1 \ 
0 47918  9   53 788778  (3, 9] (37.8, 53.8] 
1 48302  85   10 232323 (22.2, 85] [10, 21.6] 
2 8873  3   79  2323  [1, 3] (66.2, 79] 
3 63158  23   23 2323232 (22.2, 85] (21.6, 37.8] 
4  364  14   43 232323  (9, 14] (37.8, 53.8] 
5 45191  1   75 224455  [1, 3] (66.2, 79] 
6 21061  9   64  23367  (3, 9] (53.8, 66.2] 
7 41356  22   55 2346777 (14, 22.2] (53.8, 66.2] 
8 42455  14   30  23478  (9, 14] (21.6, 37.8] 
9 65460  3   16  2345  [1, 3] [10, 21.6] 

       monetary1 
0 (232323, 1095668.8] 
1 (144064.2, 232323] 
2  [2323, 19162.6] 
3 (1095668.8, 2346777] 
4 (144064.2, 232323] 
5 (144064.2, 232323] 
6 (19162.6, 144064.2] 
7 (1095668.8, 2346777] 
8 (19162.6, 144064.2] 
9  [2323, 19162.6] 
df['recency'] = pd.qcut(df['recency'].values, 5).codes + 1 
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1 
df['monetary'] = pd.qcut(df['monetary'].values, 5).codes + 1 
print df 
    userid recency frequency monetary 
0 47918  2   3   4 
1 48302  5   1   3 
2 8873  1   5   1 
3 63158  5   2   5 
4  364  3   3   3 
5 45191  1   5   3 
6 21061  2   4   2 
7 41356  4   4   5 
8 42455  3   2   2 
9 65460  1   1   1 
+0

我需要价值从1 - 5,而不是0 @jezrael – ihsansat

+0

编辑,请参阅上次编辑。 – jezrael

+0

哇,这很好,但我不明白为什么用我的全部数据'TypeError:*:'Decimal'和'float''不支持的操作数类型我是我的价值'货币',但我重新检查所有值是整数,为什么会发生? – ihsansat

相关问题