逆转string.contains在蟒蛇，熊猫

df2 = df[df['A'].str.contains("Hello|World")]

不过，我想所有的不行包含你好或世界任。我如何最有效地扭转这种情况？

2014-01-10 Xodarap777

您可以使用波浪号~翻转布尔值：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]}) 
>>> df.A.str.contains("Hello|World") 
0  True 
1 False 
2  True 
3 False 
Name: A, dtype: bool 
>>> ~df.A.str.contains("Hello|World") 
0 False 
1  True 
2 False 
3  True 
Name: A, dtype: bool 
>>> df[~df.A.str.contains("Hello|World")] 
     A 
1 this 
3 apple 

[2 rows x 1 columns]

这是否是最有效的方式，我不知道;你不得不对其他选项进行计时。有时候使用正则表达式比像df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))]这样的东西慢，但我很难猜测交叉是在哪里。

来源

2014-01-10 21:57:30 DSM

比复杂的负面查找测试好得多。然而，没有大熊猫的经验，所以我不知道什么是更快的方法。 –

正则环视测试花费了更长的时间（大约30s vs 20s），并且这两种方法显然有稍微不同的结果（3663K结果vs 3504K--来自3G原始 - 没有看到具体细节）。 – Xodarap777

@DSM我已经多次看到这个'〜'符号，特别是在JavaScript中。在Python中没有见过。这到底意味着什么？ – estebanpdl

的.contains()方法使用正则表达式，所以你可以使用一个negative lookahead test来确定某个单词是不包含：

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

这种表达的哪里话Hello和World是不任何字符串相匹配发现在字符串的任何地方。

演示：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]}) 
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$') 
0 False 
1  True 
2 False 
3  True 
Name: A, dtype: bool 
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')] 
     A 
1 this 
3 apple

来源

2014-01-10 21:56:27

我得到了'C：\ Python27 \ lib \ site-packages \ pandas \ core \ strings.py：176：UserWarning：这个模式有匹配组。要真正获得组，请使用str.extract.'。 – Xodarap777

使组未捕获。 –

逆转string.contains在蟒蛇，熊猫

回答

相关问题