在python正则表达式

捕捉重复组我有一个邮件日志文件，该文件是这样的：在python正则表达式

Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff

我要的是所有的邮件主机中含有“SM-MTA”线列表。在这种情况下，这将是：['gmail.com', 'yahoo.com', 'aol.com', 'gmail.com', gmail.com']

re.findall(r'sm-mta.*[email protected](.*?)[>, ]')将返回每个匹配行（['gmail.com','gmail.com']）

re.findall(r'[email protected](.*?)[>, ]')将返回正确的列表中只有第一台主机，但我需要过滤了。有没有解决这个问题的方法？

来源

2017-10-06 Daqol

你可以试试这个https://eval.in/875159 –

如果您不能使用的PyPI regex库，你将不得不做，在两个步骤：1）抓住与sm-mta线和2）抓住你所需要的值，喜欢的东西

进口号

txt="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff""" 
rx = r'@([^\s>,]+)' 
filtered_lines = [x for x in txt.split('\n') if 'sm-mta' in x] 
print(re.findall(rx, " ".join(filtered_lines)))

查看Python demo online。 @([^\s>,]+)模式将匹配@，并将捕获并返回除空白以外的任何1+字符，>和,。

如果你可以使用正则表达式的PyPI库，你可以得到你所需要

>>> import regex 
>>> x="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff""" 
>>> rx = r'(?:^(?=.*sm-mta)|\G(?!^)).*[email protected]\K[^\s>,]+' 
>>> print(regex.findall(rx, x, regex.M)) 
['gmail.com', 'yahoo.com', 'aol.com,', 'gmail.com', 'gmail.com']

字符串的列表，请参阅the Python online demo和regex demo。

图案的详细资料

(?:^(?=.*sm-mta)|\G(?!^)) - 具有比换行字符以外的任何字符0+后sm-mta子，或者以前的比赛结束
.*[email protected]的地方一条线 - 任何0+字符比换行字符等，尽可能少的，最多的@和@本身
\K - 即放弃在CUR到目前为止匹配的所有文字匹配的复位操作租迭代
[^\s>,]+ - 除空白，1个或多个字符，,和>

来源

2017-10-06 11:19:45

尝试regex模块。

x="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff""" 
import regex 
print regex.findall(r"sm-mta.*to=\K|\G(?!^)[email protected](.*?)[>, ]", x, version=regex.V1)

输出： ['', 'gmail.com', 'yahoo.com', 'aol.com', '', 'gmail.com', 'gmail.com']

就忽略第一个空的匹配。

https://regex101.com/r/7zPc6j/1

来源

2017-10-06 10:49:57 vks

在python正则表达式

回答

相关问题