2017-06-01 200 views
2

我有一个Python数据框,其中包含一个名为“SEGMENT”的列。我想把列分成三列。请看看我想要的输出用黄色突出显示。Python:使用Lambda将字符串字段拆分为3个独立字段

enter image description here

以下是我已经尝试了代码。不幸的是,我甚至无法得到第一个替换声明的工作。该:不会被 - 取代。任何帮助是极大的赞赏!

df_stack_ranking['CURRENT_AUM_SEGMENT'] = df_stack_ranking['CURRENT_AUM_SEGMENT'].replace(':', '-') 

s = df_stack_ranking['CURRENT_AUM_SEGMENT'].str.split(' ').apply(Series, 1).stack() 

s.index = s.index.droplevel(-1) 

s.name = 'SEGMENT' 

df_stack_ranking.join(s.apply(lambda x: Series(x.split(':')))) 

回答

2

设置

df = pd.DataFrame({'SEGMENT': {0: 'Hight:33-48', 1: 'Hight:33-48', 2: 'Very Hight:80-88'}}) 

df 
Out[17]: 
      SEGMENT 
0  Hight:33-48 
1  Hight:33-48 
2 Very Hight:80-88 

解决方案

使用拆分柱突破3份,然后扩展到创建一个新的DF。

df.SEGMENT.str.split(':|-',expand=True)\ 
    .rename(columns=dict(zip(range(3),\ 
    ['SEGMENT','SEGMENT RANGE LOW','SEGMENT RANGE HIGH']))) 
Out[13]: 
     SEGMENT SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0  Hight    33     48 
1  Hight    33     48 
2 Very Hight    80     88 
0
columns = ['SEGMENT', 'SEGMENT RANGE LOW', 'SEGMENT RANGE HIGH'] 
df['temp'] = df['SEGMENT'].str.replace(': ','-').str.split('-') 
for i, c in enumerate(columns): 
    df[c] = df['temp'].apply(lambda x: x[i]) 
del df['temp'] 

替换冒号连字符,然后分裂的连字符获得值列表为3列。然后将值分配给3列中的每一列并删除临时列。

0

我会与str.extract使用正则表达式

df.SEGMENT.str.extract('([A-Za-z ]+):(\d+)-(\d+)', expand = True).rename(columns = {0: 'SEGMENT', 1: 'SEGMENT RANGE LOW', 2: 'SEGMENT RANGE HIGH'}) 

    SEGMENT  SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0 High  33     48 
1 High  33     48 
2 Very High 80     88 
2

使用str.split通过这样做:(|)\s*-\s*\s*意味着零个或多个空格):

df = pd.DataFrame({'SEGMENT': ['Hight: 33 - 48', 'Hight: 33 - 48', 'Very Hight: 80 - 88']}) 

cols = ['SEGMENT','SEGMENT RANGE LOW','SEGMENT RANGE HIGH'] 
df[cols] = df['SEGMENT'].str.split(':\s*|\s*-\s*',expand=True) 
print (df) 
     SEGMENT SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0  Hight    33     48 
1  Hight    33     48 
2 Very Hight    80     88 

解决方案与str.extract

cols = ['SEGMENT','SEGMENT RANGE LOW','SEGMENT RANGE HIGH'] 
df[cols] = df['SEGMENT'].str.extract('([A-Za-z\s*]+):\s*(\d+)\s*-\s*(\d+)', expand = True) 
print (df) 
     SEGMENT SEGMENT RANGE LOW SEGMENT RANGE HIGH 
0  Hight    33     48 
1  Hight    33     48 
2 Very Hight    80     88 
+0

命名列完美地工作!非常感谢你:) – PineNuts0

+0

很高兴可以帮忙;) – jezrael

2

因为我喜欢从str.extract正则表达式

regex = '\s*(?P<SEGMENT>\S+)\s*:\s*(?P<SEGMENT_RANGE_LOW>\S+)\s*-\s*(?P<SEGMENT_RANGE_HIGH>\S+)\s*' 
df.SEGMENT.str.extract(regex, expand=True) 

    SEGMENT SEGMENT_RANGE_LOW SEGMENT_RANGE_HIGH 
0 High    33     48 
1 High    33     48 
2 High    80     88 

设置

df = pd.DataFrame({'SEGMENT': ['High: 33 - 48', 'High: 33 - 48', 'Very High: 80 - 88']})