2017-02-25 47 views
3

假设我有一个熊猫数据帧是这样的:如何替换熊猫数据框中的字符串中的空格?

Person_1  Person_2  Person_3 
0 John Smith Jane Smith Mark Smith 
1 Harry Jones Mary Jones Susan Jones 

重现的形式:

df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith'], 
       ['Harry Jones', 'Mary Jones', 'Susan Jones'], 
       columns=['Person_1', 'Person_2', 'Person_3']) 

什么是用下划线来代替姓和名之间的空格在每个名字的最好方法_获得:

Person_1  Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones 

预先感谢您!

回答

3

pandas

stack/unstackstr.replace

df.stack().str.replace(' ', '_').unstack() 

     Person_1 Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones 

numpy
pd.DataFrame(
    np.core.defchararray.replace(df.values.astype(str), ' ', '_'), 
    df.index, df.columns) 

     Person_1 Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones 

时间测试
enter image description here

3

我想你也可以只选择DataFrame.replace

df.replace(' ', '_', regex=True) 

输出

 Person_1 Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones 

从一些粗略的基准测试,它可预见好像piRSquared的NumPy的解决方案确实是最快的,对于这个小样本至少,其次是DataFrame.replace

%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_') 
10000 loops, best of 3: 78.4 µs per loop 

%timeit df.replace(' ', '_', regex=True) 
1000 loops, best of 3: 932 µs per loop 

%timeit df.stack().str.replace(' ', '_').unstack() 
100 loops, best of 3: 2.29 ms per loop 

有趣然而,似乎piRSquared的大熊猫解决方案适用DataFrame.replace更好地与较大DataFrames,甚至优于NumPy的解决方案。

>>> df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith']*10000, 
         ['Harry Jones', 'Mary Jones', 'Susan Jones']*10000]) 
%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_') 
10 loops, best of 3: 181 ms per loop 

%timeit df.replace(' ', '_', regex=True) 
1 loop, best of 3: 4.14 s per loop 

%timeit df.stack().str.replace(' ', '_').unstack() 
10 loops, best of 3: 99.2 ms per loop 
3

使用replace数据帧的方法:

df.replace('\s+', '_',regex=True,inplace=True) 
相关问题