Python的方式来过滤列，然后创建一个新的列

我有我这个代码打开.xlsx文件：Python的方式来过滤列，然后创建一个新的列

import pandas as pd 

df = pd.read_excel(open('file.xlsx','rb')) 
df['Description'].head

，我有以下结果，这看起来不错。

ID  | Description 
:----- | :----------------------------- 
0  | Some Description with no hash 
1  | Text with #one hash 
2  | Text with #two #hashes

现在我想创建一个新的列，只保留字开始＃，像这样的：

ID  | Description      | Only_Hash 
:----- | :----------------------------- | :----------------- 
0  | Some Description with no hash | Nan 
1  | Text with #one hash    | #one 
2  | Text with #two #hashes   | #two #hashes

我能数/单独的行以＃：

但现在我想创建像我上面描述的列。什么是最简单的方法来做到这一点？

问候！

PS：它应该显示问题中的表格格式，但我无法弄清楚它为什么显示错误！

来源

2017-07-31 Claudio

您可以使用str.findall与str.join：

df['new'] = df['Description'].str.findall('(\#\w+)').str.join(' ') 
print(df) 
    ID     Description   new 
0 0 Some Description with no hash    
1 1   Text with #one hash   #one 
2 2   Text with #two #hashes #two #hashes

而对于NaN的：

df['new'] = df['Description'].str.findall('(\#\w+)').str.join(' ').replace('',np.nan) 
print(df) 
    ID     Description   new 
0 0 Some Description with no hash   NaN 
1 1   Text with #one hash   #one 
2 2   Text with #two #hashes #two #hashes

来源

2017-07-31 11:19:38 jezrael

这种解决方案更优雅！ – MaxU

In [126]: df.join(df.Description 
    ...:   .str.extractall(r'(\#\w+)') 
    ...:   .unstack(-1) 
    ...:   .T.apply(lambda x: x.str.cat(sep=' ')).T 
    ...:   .to_frame(name='Hash')) 
Out[126]: 
    ID     Description   Hash 
0 0 Some Description with no hash   NaN 
1 1   Text with #one hash   #one 
2 2   Text with #two #hashes #two #hashes

来源

2017-07-31 11:20:55 MaxU

Python的方式来过滤列，然后创建一个新的列

回答

相关问题