如何从正则表达式组中提取信息列表？

我有多个文本被所有的结构如下：如何从正则表达式组中提取信息列表？

> Record:  24G3KL 
> Source:  Whatever 
> System Time:Oct 10, 2017 19:01:00 (MST) 
> Result:  finalText

有一些更多的文本之前和之后这一点，但它并不重要。

我们的目标是在每次遇到> Result: finalText的行时提取6个字母数字字符值（这里是“24G3KL”）。单词“finalText”可以不同（例如，它可以是abcdefText或其他任何东西）。我只对值“finalText”感兴趣。

我使用下面的正则表达式：

([A-Z0-9]{6})(?:.|\n)*(?:\s*finalText)

它工作正常，而6个字母数字字符值提取到正则表达式组1

在记事本+ +，我用的查找窗口，把我的正则表达式在“查找内容：”字段中，选择正则表达式，然后单击按钮在当前文档中查找全部。

结果是一个看起来像这个名单：

Line 85186: > Result:  finalText 
Line 86200: > Result:  finalText 
Line 87258: > Result:  finalText 
Line 87721: > Result:  finalText 
Line 87761: > Result:  finalText

我觉得这真是奇怪，因为“finalText”不是正则表达式来捕获（开始“？”）。我期望看到我的组1（我所有的6个字母数字字符值），而不是那个。

[编辑] 这是我收到：

在底部查找结果窗口，我希望看到的，而不是“finalText”值的6个字母数字字符值...

有没有办法做到这一点？

来源

2017-10-11 DotNetMatt

乍一看，您使用Notepad ++得到的结果与您的正则表达式完美匹配。所以我不明白这个正则表达式如何在另一个上下文（哪一个？）中给你“Record：”引用。 – cFreed

我不确定你的正则表达式是否正确。它看起来像'（？：。| \ n）*（？：\ s * finalText）'会搜索，直到找到一个'finalText'，跳过任何其他'Result：'不匹配'finalText'的行。所以你会捕获你不想要的'Record'字符串。 – Blorgbeard

有关我的意思，请参阅：https：//regex101.com/r/L7DQlv/1。 – Blorgbeard

从我能看到它完美匹配，使用$ 1来捕获第一个括号的内容。

来源

2017-10-11 00:13:02 Samantha

我试图

([A-Z0-9]{6})\n.*\n.*\n> Result:\W*finalText

这似乎工作：

这是假设，总是有记录和结果之间恰好两条线，虽然。

来源

2017-10-11 00:21:06 Blorgbeard

按Ctrl +˚F
查找内容：> Record:\h*[A-Z0-9]{6}(?:\R.+){2}\R> Result:\h*finalText
检查区分大小写
检查线上缠绕
检查正则表达式
不检查文件. matches newline
搜索

说明：

> Record:\h* : literally "> Record:" followed by 0 or more horizontal spaces 
[A-Z0-9]{6}  : 6 upper case letter or digit 
(?:    : non capture group 
    \R   : a line break 
    .+   : 1 or more any character 
){2}   : must be present twice 
\R    : a line break 
> Result:\h* : literally "> Result:" followed by 0 or more horizontal spaces 
finalText  : literally "finalText"

结果为给定的例子：

Search "> Record:\h*[A-Z0-9]{6}(?:\R.+){2}\R> Result:\h*finalText" (2 hits in 1 file) 
    new 2 (2 hits) 
    Line 1: > Record:  24G3KL 
    Line 9: > Record:  RNG3VS

来源

2017-10-11 08:42:34 Toto

如何从正则表达式组中提取信息列表？

回答

相关问题