2017-08-13 75 views
2

我有扩大短网址的一个问题,因为不是所有与我工作使用相同的重定向:Python的简短网址扩展

的想法是扩大短网址:这里短网址的几个例子 - >决赛网址。我需要一个函数来获取缩短网址,并返回展开网址

http://chollo.to/675za - >http://www.elcorteingles.es/limite-48-horas/equipaje/?sorting=priceAsc&aff_id=2118094&dclid=COvjy8Xrz9UCFeMi0wod4ZULuw

所以发我有事半的工作,它在一些例子的Abobe失败

import requests 
import httplib 
import urlparse 


def unshorten_url(url): 
try: 

parsed = urlparse.urlparse(url) 
h = httplib.HTTPConnection(parsed.netloc) 
h.request('HEAD', parsed.path) 
response = h.getresponse() 

if response.status/100 == 3 and response.getheader('Location'): 
url = requests.get(response.getheader('Location')).url 
print url 
return url 

else: 
url = requests.get(url).url 
print url 
return url 


except Exception as e: 
print(e) 
+1

你用上面的例子得到了什么错误? –

+0

https://murphy.rs/nikola/flask-url-shortener – pregmatch

+0

我收到了一个中级网站http://web.epartner.es/click3.aspx?ref=754218&site=14010&type=text&tnb=39&diurl=https%3A% 2F%2Fad.doubleclick.net%2Fddm%2Fclk%2F302111021%3B129203261%3BY%3Fhttp%3A%2F%2Fwww.elcorteingles.es%2Flimite-48-horas%2Fequipaje%2F%3Fsorting%3DpriceAsc%26aff_id%3D2118094 – user8459020

回答

0

预期重定向并不显得well-formed根据requests

import requests 

response = requests.get('http://chollo.to/675za') 
for resp in response.history: 
    print(resp.status_code, resp.url) 
print(response.url) 
print(response.is_redirect) 

输出:

301 http://chollo.to/675za 
http://web.epartner.es/click.asp?ref=754218&site=14010&type=text&tnb=39&diurl=https%3A%2F%2Fad.doubleclick.net%2Fddm%2Fclk%2F302111021%3B129203261%3By%3Fhttp%3A%2F%2Fwww.elcorteingles.es%2Flimite-48-horas%2Fequipaje%2F%3Fsorting%3DpriceAsc%26aff_id%3D2118094 
False 

这可能是故意通过epartner双击。对于这些类型的嵌套的URL,你需要像一个额外的步骤:

from urllib.parse import unquote 
# from urllib import unquote # python2 

# if response.url.count('http') > 1: 
url = 'http' + response.url.split('http')[-1] 
unquote(url) 

# http://www.elcorteingles.es/limite-48-horas/equipaje/?sorting=priceAsc&aff_id=2118094 

注:通过这样做,你可能避免预期的广告收入。