非匹配词在python中删除

我有一个基于文本的字符串，并且只想保留特定的单词。非匹配词在python中删除

sample = "This is a test text. Test text should pass the test" 
approved_list = ["test", "text"]

预期输出：

"test text Test text test"

我已经经历了很多的regex基于阅读的答案，可惜的是他们没有解决这个具体问题。

解决方案是否也可以扩展到熊猫系列？

来源

2017-07-01 Drj

您不需要pandas。如果你有一个pd.Series

sample = pd.Series(["This is a test text. Test text should pass the test"] * 5) 
approved_list = ["test", "text"]

使用str串访问

sample.str.findall('|'.join(approved_list), re.IGNORECASE) 

0 [test, text, Test, text, test] 
1 [test, text, Test, text, test] 
2 [test, text, Test, text, test] 
3 [test, text, Test, text, test] 
4 [test, text, Test, text, test] 
dtype: object

来源

2017-07-01 22:34:16 piRSquared

由于使用正则表达式模块re

import re re.findall('|'.join(approved_list), sample, re.IGNORECASE) ['test', 'text', 'Test', 'text', 'test']

，这是有帮助的。我之所以提到熊猫，是因为'approved_list'需要应用到'pd.Series'的每个值上。你有什么建议吗？ – Drj

@Drj更新了我的文章。 – piRSquared

非匹配词在python中删除

回答

相关问题