我觉得urlparse
点给你,你可以使用这个你想要什么:
m=re.search(r'(?<=www\.)[a-zA-Z\-]+\.[a-zA-Z]+',s)
print m.group(0)
结果:
some-super-domain.de
尝试HERE!
所以如果你使用urlparse
的结果是这样的:
s='/cgi-bin/ivw/CP/dbb_ug_sp;?r=http%3A//www.some-super-domain.de/forum/viewtopic.php%3Ff%3D2%26t%3D18564%26start%3D75&d=76756.76050130278'
from urlparse import urlparse
o = urlparse(s)
print o
结果:
ParseResult(scheme='', netloc='', path='/cgi-bin/ivw/CP/dbb_ug_sp', params='', query='r=http%3A//www.some-super-domain.de/forum/viewtopic.php%3Ff%3D2%26t%3D18564%26start%3D75&d=76756.76050130278', fragment='')
所以这个结果,你可以访问域与o.query
但它是不是你想要的是包含额外的字符!
>>>print o.query
>>>r=http%3A//www.some-super-domain.de/forum/viewtopic.php%3Ff%3D2%26t%3D18564%26start%3D75&d=76756.76050130278
您的预期产出是? – 2014-09-03 09:07:34
some-super-domain.de – nottinhill 2014-09-03 09:22:22