2017-09-03 89 views
1

段:大熊猫为什么pd.cut()产生负价值

test = pd.DataFrame({'counts':[0,1,2,3,4,5,6,11,12,14,15]}) 
test['range'] = pd.cut(test.counts, [0,5,10,15], include_lowest=True) 
test 

输出:

counts range 
0 0 (-0.001, 5.0] 
1 1 (-0.001, 5.0] 
2 2 (-0.001, 5.0] 
3 3 (-0.001, 5.0] 
4 4 (-0.001, 5.0] 
5 5 (-0.001, 5.0] 
6 6 (5.0, 10.0] 
7 11 (10.0, 15.0] 
8 12 (10.0, 15.0] 
9 14 (10.0, 15.0] 
10 15 (10.0, 15.0] 

我能得到(0, 5.0]代替(-0.001, 5.0]为什么-0.001显示,即使我没?没有指定它?

回答

2

这是the result of include_lowest=True internal logic

您可以用同样的方式pd.cuts()生成的自己的标签做它时include_lowest=False

In [50]: import pandas.core.algorithms as algos 

In [51]: labels = pd.Categorical(pd.core.reshape.tile._format_labels(algos.unique(bins), precision=0), 
           ordered=True) 

In [52]: labels 
Out[52]: 
[(0, 5], (5, 10], (10, 15]] 
Categories (3, interval[int64]): [(0, 5] < (5, 10] < (10, 15]] 

In [53]: test['range'] = pd.cut(test.counts, [0,5,10,15], 
           labels=labels, 
           include_lowest=True) 

In [54]: test 
Out[54]: 
    counts  range 
0  0 (0, 5] 
1  1 (0, 5] 
2  2 (0, 5] 
3  3 (0, 5] 
4  4 (0, 5] 
5  5 (0, 5] 
6  6 (5, 10] 
7  11 (10, 15] 
8  12 (10, 15] 
9  14 (10, 15] 
10  15 (10, 15] 
+0

我想'箱= [0,5,10,15]'这是不是在你的代码所示。 – Cheng

+0

@Cheng,是的,抱歉 - 忘了提及那个。 :) – MaxU