正则表达式分隔文本中没有分隔符的url

我有一个不带分隔符

 
https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZghttps://console.developers.google.com/project/reducted/?authuser=1\n

这个例子只包含两个URL，而这有帮倒忙多个URL（网址只）输入文本都在同一行，但它可能会更多。

我试图使用python

我试图寻找解决方案分开的网址，进入一个列表，并尝试了一些，但不能得到这个工作完全，因为他们贪婪地消耗掉所有以下网址。 https://stackoverflow.com/a/6883094/659346

我意识到这可能是因为https://...可能在URL的查询部分可能合法允许，但在我的情况下，我愿意假设它不能，并假设它发生时，它的开始下一个网址。

我也试过(http[s]://.*?)但与不?要么使它获得文本的整个位或只是https://

来源

2015-01-15 GP89

您需要使用positive lookahead assertion。

>>> s = "https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZghttps://console.developers.google.com/project/reducted/?authuser=1\n" 
>>> re.findall(r'https?://.*?(?=https?://|$|\s)', s) 
['https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZg', 'https://console.developers.google.com/project/reducted/?authuser=1']

来源

2015-01-15 15:26:47

(https?:\/\/(?:(?!https?:\/\/).)*)

尝试this.See演示。

https://regex101.com/r/tX2bH4/15

import re 
p = re.compile(r'(https?:\/\/(?:(?!https?:\/\/).)*)') 
test_str = "https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZghttps://console.developers.google.com/project/reducted/?authuser=1\n" 

re.findall(p, test_str)

来源

2015-01-15 15:22:23 vks

如果url字符串在中间包含“http”，这将不起作用。 – mbomb007 2015-01-15 15:24:01

例子，它不会工作：http：//golang.org/pkg/net/http/ – mbomb007 2015-01-15 15:24:48

是的我宁愿有'http [s]？：//'的前瞻性测试使它更多一点强大的。似乎无法解决如何将'：//'添加到您的答案，但：S – GP89 2015-01-15 15:26:44

正则表达式分隔文本中没有分隔符的url

回答

相关问题