为的findall

充分表达我有一个正则表达式查找一个URL像一些文字：为的findall

my_urlfinder = re.compile(r'\shttp:\/\/(\S+.|)blah.com/users/(\d+)(\/|)') 
text = "blah blah http://blah.com/users/123 blah blah http://blah.com/users/353" 

for match in my_urlfinder.findall(text): 
    print match #prints an array with all the individual parts of the regex

如何获得完整的URL？目前匹配只是打印出匹配的部分（我需要其他东西）...但我也想要完整的网址。

来源

2013-03-06 9-bits

最简单的将是增加一个额外的括号，包围了整个正则表达式。然后你将它与零件一起得到！ – alexis 2013-03-06 14:51:58

另一种不使用任何捕获组是添加周围的一切另一个问题：

my_urlfinder = re.compile(r'\s(http:\/\/(\S+.|)blah.com/users/(\d+)(\/|))')

这将让你保持内捕获组同时还具有整个结果。

对于演示文本，将产生以下结果：

('http://blah.com/users/123', '', '123', '') 
('http://blah.com/users/353', '', '353', '')

作为一个侧面说明要小心，目前的表达，需要一个空白在网址前面，所以如果文本开始一个会不匹配。

来源

2013-03-06 14:39:26 poke

这正是我需要的 - 谢谢！ – 2013-03-06 15:41:19

你应该让你的组非捕获：

my_urlfinder = re.compile(r'\shttp:\/\/(?:\S+.|)blah.com/users/(?:\d+)(?:\/|)')

findall()当有捕获组改变行为。通过组，它只会返回组，而不捕获组，而是返回整个匹配的文本。

演示：

>>> text = "blah blah http://blah.com/users/123 blah blah http://blah.com/users/353" 
>>> my_urlfinder = re.compile(r'\shttp:\/\/(?:\S+.|)blah.com/users/(?:\d+)(?:\/|)') 
>>> for match in my_urlfinder.findall(text): 
...  print match 
... 
http://blah.com/users/123 
http://blah.com/users/353

来源

2013-03-06 14:36:57

回答

相关问题