2017-04-05 67 views
0

我想合并用户提供的URL相对路径和文件路径。例如,如果我给出的以下项目:基于URL模板合并相对路径

url_base = 'http://myserver.com/my/path/to/files' 
path = 'path/to/files/foo.txt' 

所需的输出将

http://myserver.com/my/path/to/files/foo.txt 

凡URL和文件之间的共同路径元素已经合并; my/path/to/filespath/to/files/foo.txt合并为my/path/to/files/foo.txt,后者被追加回到URL的底部。

我能得到是这样的最接近:

# python 2.7 
import os 
import urlparse 
from collections import OrderedDict 

url_base = 'http://myserver.com/my/path/to/files' 
path = 'path/to/files/foo.txt' 

url = urlparse.urlparse(url_base) 
print(url) 
# ParseResult(scheme='http', netloc='myserver.com', path='/my/path/to/files', params='', query='', fragment='') 

merge_path = os.path.join(url.path, path) 
print(merge_path) 
# /my/path/to/files/path/to/files/foo.txt 

# take an ordered set of the path components 
# this is not good because it assumes '/' is the split key 
merge_path_set = list(OrderedDict.fromkeys(merge_path.split('/'))) 
print(merge_path_set) 
# ['', 'my', 'path', 'to', 'files', 'foo.txt'] 

path_joined = os.path.join(*merge_path_set) 
print(path_joined) 
# my/path/to/files/foo.txt 

# THIS DOESN'T WORK: 
url_joined = urlparse.urljoin(url.netloc, path_joined) 
print(url_joined) 
# my/path/to/files/foo.txt 

好像应该有更好的方式来做到这一点,利用内置库,而不是手动分割上'/'并采取有序集合,像我一样这里。我还没有想出如何将其返回到URL输出。有任何想法吗?

回答

0

urljoin()可以正常工作,如果您将第二个参数与路径组件url_base一致。

对于Python 2.7:

from urlparse import urljoin 

url_base = 'http://myserver.com/my/path/to/files' 
path = 'path/to/files/foo.txt' 

final_url = urljoin(url_base, '/my/' + path) 

# http://myserver.com/my/path/to/files/foo.txt 

对于Python 3:

from urllib.parse import urljoin 

url_base = 'http://myserver.com/my/path/to/files' 
path = 'path/to/files/foo.txt' 

final_url = urljoin(url_base, '/my/' + path) 

# http://myserver.com/my/path/to/files/foo.txt 

假设pathpath/to/files将总是匹配的url_basepath/to/files组成部分,而且可以追加一个 '/' 到url_base,虽然它确实使用了split的变体,但您可以这样做:

import os 
from urlparse import urljoin 

url_base = 'http://myserver.com/my/path/to/files/' 
path = 'path/to/files/foo.txt' 

final_url = urljoin(url_base, os.path.split(path)[-1]) 

print(final_url) 
# http://myserver.com/my/path/to/path/to/files/foo.txt 
+0

在这种情况下,''/ my /''是硬编码的,所以你不能在有变量输入的程序中使用它 – user5359531

+0

你可以添加你的问题来说明你的约束是什么吗?正如所写,似乎暗示'http:// myserver.com/my /'是'url_base'的常量组件。我会用更多的信息调整我的答案。干杯! –

+0

更新了问题 – user5359531