2016-04-25 83 views
2

我有这个数据集,代表火星陨石坑的信息。我只对纬度,经度和可见图层列数感兴趣。我试图将纬度分成10度的组:将-90到-80,-80到-70等等同时取出图层列值(0,1,2,3,4,5)并将它们变成列自己获取每个10度组的每个图层列值的value_counts表。大熊猫非常具体的任务

What I want

What I have

拉我的头发在这个貌似我试过了所有我能理解。

+3

能否请您发布样本_input_和_output_数据集(5- CSV/dict/JSON/Python代码格式中的7行__为text__,因此编码时可以使用它)并描述您想要如何处理输入数据i为了获得输出数据集? [如何创建一个最小化,完整和可验证的示例](http://stackoverflow.com/help/mcve) – MaxU

回答

3

这是否适合您?

import pandas as pd 
import random 

# generate random data 
N = 100 
longitudes = [random.randint(-20, 89) for _ in xrange(N)] 
layers = [random.randint(0, 5) for _ in xrange(N)] 
data = pd.DataFrame({'LONGITUDE_CIRCLE_IMAGE': longitudes, 'NUMBER_LAYERS': layers}) 

def get_degree_group(longitude, mn=-20, mx=90, delta_deg=10): 
    """calculate the layer from the given longitude""" 
    return (longitude - mn)/delta_deg 

def make_table(df): 
    # make a new column by calculating the degree group from longitude column 
    df['degree_group'] = df.LONGITUDE_CIRCLE_IMAGE.apply(get_degree_group) 
    # count the number of craters with properties (deg_grp, num_lyr) 
    s = df.groupby(['degree_group', 'NUMBER_LAYERS']).size() 
    # s is a pandas Series where the index is in the form: (deg_grp, num_lyr) 
    # and the values are the counts of crates in that category 
    # 
    # We want to convert the series into a table where num_lyr values are columns 
    # This task is done with unstack method 
    table = s.unstack('NUMBER_LAYERS') 
    # there are some (deg_grp, num_lyr) cases for which there are no existing craters 
    # Pandas put NaN for those cases. It might be better to put 0 into those cells 
    table.fillna(0, inplace = True) 
    return table 

make_table(data) 
1

使用pd.cut来使群组和pivot_table进行计数。

数据的样本:

lat=rand(3000)*180-90 
layers=randint(0,6,3000) 
data=pd.DataFrame({'lat':lat,'layers':layers}) 

制作18组:

data['groups'] = pd.cut(lat,linspace(-90,90,19)) 

和一张桌子:

data.pivot_table(index='groups',columns='layers',aggfunc='count',fill_value=0) 

      lat    
layers  0 1 2 3 4 5 
groups      
(-90, -80] 4 1 2 1 1 0 
(-80, -70] 1 0 0 2 2 3 
(-70, -60] 4 3 2 4 3 4 
(-60, -50] 6 2 1 1 2 3 
(-50, -40] 2 3 4 2 2 4 
(-40, -30] 4 3 4 2 4 4 
(-30, -20] 2 5 2 2 3 2 
(-20, -10] 4 2 6 3 5 2 
(-10, 0]  3 4 2 3 2 1 
(0, 10]  5 3 4 3 4 7 
(10, 20]  3 3 2 2 2 3 
(20, 30]  2 1 1 4 3 5 
(30, 40]  1 2 0 2 2 3 
(40, 50]  1 3 3 2 3 4 
(50, 60]  6 0 2 4 1 6 
(60, 70]  3 3 2 5 1 5 
(70, 80]  1 4 5 3 2 2 
(80, 90]  1 7 3 2 4 2 
+0

pd.cut中的linspace是什么? – Holmesjr

+0

关于范围:http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linspace.html –

+0

data ['LatGroups'] = pd.cut(“LATITUDE_CIRCLE_IMAGE”,numpy。 linspace(-90,90,19) 我得到“TypeError:根据规则'safe'无法将dtype('float64')的数组数据转换为dtype(' Holmesjr