如何使用正则表达式按给定范围获得匹配结果？

我用我的代码来获得所有返回匹配给定的范围。我的数据样本是：如何使用正则表达式按给定范围获得匹配结果？

 comment 
0  [intj74, you're, whipping, people, is, a, grea... 
1  [home, near, kcil2, meniaga, who, intj47, a, l... 
2  [thematic, budget, kasi, smooth, sweep] 
3  [budget, 2, intj69, most, people, think, of, e...

我想要得到的结果为：（当给定的范围是intj1到intj75）

  comment 
0  [intj74] 
1  [intj47]  
2  [nan] 
3  [intj69]

我的代码是：

df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74']) 
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]

我m不知道如何使用正则表达式来找到t =='range'的范围。或者任何其他想法做到这一点？

由于提前，

熊猫Python的新手

来源

2016-09-15 Suhairi Suhaimin

'intj \ d +'匹配'intj'后跟一个或多个数字。 – Maroun

@Maroun Maroun谢谢你的回复。不幸的是它不工作。返回所有[nan] ....或者如何应用您的建议？ –

你可以取代[t for t in x if t=='intj74']用，例如，

[t for t in x if re.match('intj[0-9]+$', t)]

甚至

[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]

这也将处理，如果情况没有匹配（所以不需要检查对于明确使用df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]）这里的“诀窍”是空列表的计算结果为False，这样or在这种情况下返回其右操作数。

来源

2016-09-15 08:49:26 ewcz

Yess !!!导入完成后，解决方案re.match（'intj [0-9] + $'，t）很好。非常感谢你@ewcz –

再次感谢@ewcz分享的“诀窍”。我试过了，它的工作，甚至缩短我的代码。 –

我是pandas的新手。你可能已经初始化了你的DataFrame。无论如何，这是我有：

import pandas as pd 

data = { 
    'comment': [ 
     "intj74, you're, whipping, people, is, a", 
     "home, near, kcil2, meniaga, who, intj47, a", 
     "thematic, budget, kasi, smooth, sweep", 
     "budget, 2, intj69, most, people, think, of" 
    ] 
} 
print(df.comment.str.extract(r'(intj\d+)'))

来源

2016-09-15 08:55:00

感谢您提出.str.extract，这是另一种方法。然而，我得到FutureWarning：目前提取（展开=无）意味着expand = False（返回Index/Series/DataFrame），但在未来版本的熊猫中，这将改为expand = True（返回DataFrame） if __name__ =='__main__ “：。我所有的结果都是NaN。 –

你可以显式地通过扩展参数：'df.comment.str.extract（r'（intj \ d +）'，expand = True）'。 True将返回一个DataFrame。假将返回一个系列。使用适合你的东西。 –

哦，我明白了。感谢解释@arvindpdmn –

如何使用正则表达式按给定范围获得匹配结果？

回答

相关问题