2016-07-27 79 views
3

我想要的状态名称分配给大学的名单:变化真/假值离散值大熊猫数据框与np.where()

df = pd.DataFrame({'College': pd.Series(['University of Michigan', 'University of Florida', 'Iowa State'])}) 
State = ['Michigan', 'Iowa'] 
df['State'] = np.where(df['College'].str.contains('|'.join(State)), 
    'state','--') 

我想,以取代“状态“当与州的实际名称匹配时出现的值。例如:密歇根大学 - >密歇根州(而不是“州”)。最终,“国家”将拥有全部50个州,所以我不能为每个州名写50个“np.where”语句。

谢谢你的帮助。

回答

3

你可以使用str.extract这里,而不是np.where

In [290]: df['State'] = df['College'].str.extract('({})'.format('|'.join(State)), expand=True) 

In [291]: df 
Out[291]: 
        College  State 
0 University of Michigan Michigan 
1 University of Florida  NaN 
2    Iowa State  Iowa 
1
States = [ 
      'Washington' 'Wisconsin' 'West Virginia' 'Florida' 'Wyoming' 
      'New Hampshire' 'New Jersey' 'New Mexico' 'National' 'North Carolina' 
      'North Dakota' 'Nebraska' 'New York' 'Rhode Island' 'Nevada' 'Guam' 
      'Colorado' 'California' 'Georgia' 'Connecticut' 'Oklahoma' 'Ohio' 'Kansas' 
      'South Carolina' 'Kentucky' 'Oregon' 'South Dakota' 'Delaware' 
      'District of Columbia' 'Hawaii' 'Puerto Rico' 'Texas' 'Louisiana' 
      'Tennessee' 'Pennsylvania' 'Virginia' 'Virgin Islands' 'Alaska' 'Alabama' 
      'American Samoa' 'Arkansas' 'Vermont' 'Illinois' 'Indiana' 'Iowa' 
      'Arizona' 'Idaho' 'Maine' 'Maryland' 'Massachusetts' 'Utah' 'Missouri' 
      'Minnesota' 'Michigan' 'Montana' 'Northern Mariana Islands' 'Mississippi' 
] 

state_str = '|'.join(States) 
df.update(df.College.str.extract(r'(?P<State>{})'.format(state_str), expand=True)) 

df 

enter image description here