使用索引或查找方法的准确单词匹配 - python

我有一个字符串“那么那里”，我想搜索确切/完整的单词，例如，在这种情况下，“该”只出现一次。但是使用index（）或find（）方法认为它出现三次，因为它与“then”和“there”部分匹配。我喜欢使用这些方法中的任何一种，我可以通过任何方式调整它们的工作方式？使用索引或查找方法的准确单词匹配 - python

>>> s = "the then there" 
>>> s.index("the") 
0 
>>> s.index("the",1) 
4 
>>> s.index("the",5) 
9 
>>> s.find("the") 
0 
>>> s.find("the",1) 
4 
>>> s.find("the",5) 
9

来源

2016-11-15 user3806770

使用正则表达式'\ bthe \ b' –

首先转换成字符串列表使用str.split()的话，然后搜索这个词。

>>> s = "the then there" 
>>> s_list = s.split() # list of words having content: ['the', 'then', 'there'] 
>>> s_list.index("the") 
0 
>>> s_list.index("then") 
1 
>>> s_list.index("there") 
2

来源

2016-11-15 06:48:30

性能是我的用例的一个问题，因为它可能是一个非常大的文件，因此试图避免做一个巨大的列表... – user3806770

无论如何这是一个巨大的文件。要么你需要将它存储为'str'或'list'，但你需要将它存储在某个地方。对？以字符串形式阅读内容，形成一个列表。如果你对节省空间更感兴趣。获得列表后，将其转换为字典，其中的词作为键和值作为该词第一次出现的索引。显式删除未使用的变量，如存储字符串和列表的变量 –

找大文本中的确切/完整的单词的第一位置，尝试使用re.search()和match.start()功能应用以下方法：

import re 

test_str = "when we came here, what we saw that the then there the" 
search_str = 'the' 
m = re.search(r'\b'+ re.escape(search_str) +r'\b', test_str, re.IGNORECASE) 
if m: 
    pos = m.start() 
    print(pos)

输出：

https://docs.python.org/3/library/re.html#re.match.start

来源

2016-11-15 08:28:16 RomanPerekhrest

使用索引或查找方法的准确单词匹配 - python

回答

相关问题