2016-11-25 98 views
2
target_value  title people  start end twitter_map 
0 AGE_13_TO_17  13 to 17  1  13 17 AGE_13_TO_17 
1 AGE_13_TO_24  13 to 24  NaN  13 24   NaN 
2 AGE_13_TO_34  13 to 34  NaN  13 34   NaN 
3 AGE_13_TO_49  13 to 49  NaN  13 49   NaN 
4 AGE_13_TO_54  13 to 54  NaN  13 54   NaN 
5 AGE_OVER_13  Age Over 13 NaN  13 -   NaN 
6 AGE_18_TO_24  18 to 24  7  18 24 AGE_18_TO_24 
7 AGE_18_TO_54  18 to 54  NaN  18 54   NaN 
8 AGE_OVER_18  Age Over 18 NaN  18 -   NaN 
9 AGE_21_TO_34  21 to 34  NaN  21 34   NaN 
10 AGE_21_TO_49  21 to 49  NaN  21 49   NaN 
11 AGE_21_TO_54  21 to 54  NaN  21 54   NaN 
12 AGE_25_TO_34  25 to 34  34  25 34 AGE_25_TO_34 
13 AGE_25_TO_49  25 to 49  NaN  25 49   NaN 
14 AGE_OVER_25 Age Over 25 NaN  25 -   NaN 
15 AGE_35_TO_44  35 to 44  15  35 44 AGE_35_TO_44 
16 AGE_OVER_35 Age Over 35 NaN  35 -   NaN 
17 AGE_45_TO_54  45 to 54  1  45 54 AGE_45_TO_54 
18 AGE_OVER_50 Age Over 50 NaN  50 -   NaN 
19 AGE_55_TO_64  55 to 64  3  55 64 AGE_55_TO_64 
20 AGE_OVER_65   65+  6  65 - AGE_OVER_65 
21   None  All Ages NaN All Ages -   NaN 

因此,我有如上所示的这个数据框,其中包含一些年龄开始和年龄结束的值。但是有一些重叠的年龄段。我需要的基础上,专门值栏填写正确的在熊猫数据框中获取重叠年龄段的年龄总和

料到产出的前两行

target_value  title people  start end twitter_map 
0 AGE_13_TO_17  13 to 17  1  13 17 AGE_13_TO_17 
1 AGE_13_TO_24  13 to 24  8  13 24   NaN 
+0

前三栏已经加入了与过去的三列 –

+2

什么是预期的输出是什么呢? –

+0

我在前两行给出了一个示例...我希望它解释 –

回答

2

我将在一个简单的例子工作:

people start end 
    1 13 17 
    NaN 13 24 
    NaN 13 34 
    NaN 13 - 
    7 18 24 
    NaN 18 - 
    34 25 34 

首先更换-与无穷大,将所有浮动:

import numpy as np 
df = df.replace({'-': np.inf}).astype(float) 

然后选择其中给出的“人”的数列,这将是输入:

df_input = df.dropna() 

现在定义以下功能:

def func(row): 
    return df_input.loc[ 
      (df_input['start'] >= row['start']) & (df_input['end'] <= row['end']), 
      'people' 
     ].sum() 

对于在每一行数据框,它将输入中满足定义年龄段条件的所有数字相加(这是无穷大有用的地方)。

最后应用功能:

In [36]: df.apply(func, axis=1) 
Out[36]: 
0  1.0 
1  8.0 
2 42.0 
3 42.0 
4  7.0 
5 41.0 
6 34.0 
+0

谢谢!有一次,我更快;) – IanS

+0

是的,我知道在说什么...... – jezrael