2016-09-29 104 views
3

的特定列我有一个数据帧这样的:添加前缀数据帧

col1 col2 col3 col4 col5 col6 col7 col8 
0  5345 rrf rrf rrf rrf rrf rrf 
1  2527 erfr erfr erfr erfr erfr erfr 
2  2727 f  f  f  f  f  f 

我想重命名所有列,但不COL1COL2

于是,我就做一个循环

print(df.columns) 
    for col in df.columns: 
     if col != 'col1' and col != 'col2': 
      col.rename = str(col) + '_x' 

但它不是非常有效的...这是行不通的!

回答

7

可以使用DataFrame.rename()方法

new_names = [(i,i+'_x') for i in df.iloc[:, 2:].columns.values] 
df.rename(columns = dict(new_names), inplace=True) 
3

可以使用str.contains用正则表达式来筛选感兴趣的cols,然后使用zip构建一个字典,并通过此作为对Arg的rename:使用str.contains筛选列将

In [94]: 
cols = df.columns[~df.columns.str.contains('col1|col2')] 
df.rename(columns = dict(zip(cols, cols + '_x')), inplace=True) 
df 

Out[94]: 
    col1 col2 col3_x col4_x col5_x col6_x col7_x col8_x 
0  0 5345 rrf rrf rrf rrf rrf rrf 
1  1 2527 erfr erfr erfr erfr erfr erfr 
2  2 2727  f  f  f  f  f  f 

所以在这里返回不匹配列,以便列顺序是无关紧要的

+0

Wahou!这是完美的 !有可能使用'str.value ='或者这样的代码? –

+0

不知道你在用那段代码尝试什么,但通常你需要使用'rename'或直接覆盖columns属性 – EdChum

1

Simpliest溶液如果col1col2是第一和第二列名称:

df.columns = df.columns[:2].union(df.columns[2:] + '_x') 
print (df) 
    col1 col2 col3_x col4_x col5_x col6_x col7_x col8_x 
0  0 5345 rrf rrf rrf rrf rrf rrf 
1  1 2527 erfr erfr erfr erfr erfr erfr 
2  2 2727  f  f  f  f  f  f 

isin或列表解析的另一个解决方案:

cols = df.columns[~df.columns.isin(['col1','col2'])] 
print (cols) 
['col3', 'col4', 'col5', 'col6', 'col7', 'col8'] 

df.rename(columns = dict(zip(cols, cols + '_x')), inplace=True) 

print (df) 

    col1 col2 col3_x col4_x col5_x col6_x col7_x col8_x 
0  0 5345 rrf rrf rrf rrf rrf rrf 
1  1 2527 erfr erfr erfr erfr erfr erfr 
2  2 2727  f  f  f  f  f  f 

cols = [col for col in df.columns if col not in ['col1', 'col2']] 
print (cols) 
['col3', 'col4', 'col5', 'col6', 'col7', 'col8'] 

df.rename(columns = dict(zip(cols, cols + '_x')), inplace=True) 

print (df) 

    col1 col2 col3_x col4_x col5_x col6_x col7_x col8_x 
0  0 5345 rrf rrf rrf rrf rrf rrf 
1  1 2527 erfr erfr erfr erfr erfr erfr 
2  2 2727  f  f  f  f  f  f 

最快的是列表理解:

df.columns = [col+'_x' if col != 'col1' and col != 'col2' else col for col in df.columns] 

时序

In [350]: %timeit (akot(df)) 
1000 loops, best of 3: 387 µs per loop 

In [351]: %timeit (jez(df1)) 
The slowest run took 4.12 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 207 µs per loop 

In [363]: %timeit (jez3(df2)) 
The slowest run took 6.41 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 75.7 µs per loop 

df1 = df.copy() 
df2 = df.copy() 

def jez(df): 
    df.columns = df.columns[:2].union(df.columns[2:] + '_x') 
    return df 

def akot(df): 
    new_names = [(i,i+'_x') for i in df.iloc[:, 2:].columns.values] 
    df.rename(columns = dict(new_names), inplace=True) 
    return df 


def jez3(df): 
    df.columns = [col + '_x' if col != 'col1' and col != 'col2' else col for col in df.columns] 
    return df 


print (akot(df)) 
print (jez(df1)) 
print (jez2(df1)) 
+0

我添加了时间,请检查它。 – jezrael