尝试获取domain.zz或domain.zzz或domain.zz.zz或/ something。regexp for(domain.zzz domain.zz.zz domain.zz)and/something/
import re
the_string = """lalalla?url=http2F%2Fdomain.zz%slgkfgs0s"""
the_string = """lalalla?url=http2F%2Fdomain.zz.zz/something%slgkfgs0sf"""
the_string = """lalalla?url=randomh564domain.zzz/something%slgkfgs0sf"""
the_string = """lalalla?url=randomeefsdlk876%domain.zz/something%slgkfgs0sf"""
the_string = """p%3A%2F%2Fdummy_test.com/ratata%2F&"""
the_string = """p%3A%2F%2Fdum2test.co.uk/something%2F&-kj"""
这是我现在有:
>>> print(re.findall('(?:www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4})(?:/[a-z0-9]+)',the_string))
domain.zzz/something
domain.zz/something
domain.zz.zz/something
>>> print(re.findall('www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}',the_string))
domain.zzz
domain.zz
domain.zz.zz
,我想获得这两个群体有一个重新的问题。
编辑: 这个是近乎完美的: '([a-z0-9 .-] + [。] [az] {2,4})|(?:/ [a-z0-9] +)' ,但它从字符串的开头抓取一些垃圾。
字符串比这个例子更随机: 我专注于那些三种情况:
domain.co.uk/something
^^^
domain.com/something
^^
domain.com
^
是域常量? – VladL 2013-03-07 10:52:49
不,不是。它改变。 – okobaka 2013-03-07 10:57:32