无法用python编辑URL

-7

我是python的新手，只是想知道这是否可能：我已经使用urllib刮了一个url并且想要编辑不同的页面。无法用python编辑URL

例： http://test.com/All/0.html

我想0.html成为50.html然后100.html等等......

2016-04-15 Simon Brown

是的，可以更改URL的最后部分。 – vaultah

谢谢，但我该如何去做。 ive尝试过url.split，但似乎无法让它改变url的正确部分。或者我可以一次性抓到一个，而不是一个一个 –

刚刚搜索“python修改url”，第一个链接是[修改Python 2中的URL组件]（http://stackoverflow.com/q/24200988/2301450）。你甚至可以使用'str.rpartition'或'str.split'。如果您的代码存在*特定的问题，请包括您迄今为止编写的代码，示例输入（如果有的话），期望的输出以及实际获得的输出（控制台输出，回溯等） – vaultah

found_url = 'http://test.com/All/0.html' 

base_url = 'http://test.com/All/' 

for page_number in range(0,1050,50): 
    url_to_fetch = "{0}{1}.html".format(base_url,page_number)

这应该给你0.html网址1000.html

如果你想使用urlparse（正如在评论中建议的那样）乌尔问题）：

import urlparse 

found_url = 'http://test.com/All/0.html' 
parsed_url = urlparse.urlparse(found_url) 
path_parts = parsed_url.path.split("/") 

for page_number in range(0,1050,50): 
    new_path = "{0}/{1}.html".format("/".join(path_parts[:-1]), page_number) 
    parsed_url = parsed_url._replace(path= new_path) 
    print parsed_url.geturl()

执行这个脚本会给你以下几点：

http://test.com/All/0.html 
http://test.com/All/50.html 
http://test.com/All/100.html 
http://test.com/All/150.html 
http://test.com/All/200.html 
http://test.com/All/250.html 
http://test.com/All/300.html 
http://test.com/All/350.html 
http://test.com/All/400.html 
http://test.com/All/450.html 
http://test.com/All/500.html 
http://test.com/All/550.html 
http://test.com/All/600.html 
http://test.com/All/650.html 
http://test.com/All/700.html 
http://test.com/All/750.html 
http://test.com/All/800.html 
http://test.com/All/850.html 
http://test.com/All/900.html 
http://test.com/All/950.html 
http://test.com/All/1000.html

而是打印在for循环，您可以使用parsed_url.geturl的值（）根据自己的需要的。如前所述，如果你想抓取网页的内容，你可以使用Python requests模块以下列方式：

import requests 

found_url = 'http://test.com/All/0.html' 
parsed_url = urlparse.urlparse(found_url) 
path_parts = parsed_url.path.split("/") 

for page_number in range(0,1050,50): 
    new_path = "{0}/{1}.html".format("/".join(path_parts[:-1]), page_number) 
    parsed_url = parsed_url._replace(path= new_path) 
    # print parsed_url.geturl() 
    url = parsed_url.geturl() 
    try: 
     r = requests.get(url) 
     if r.status_code == 200: 
      with open(str(page_number)+'.html', 'w') as f: 
       f.write(r.content) 
    except Exception as e: 
     print "Error scraping - " + url 
     print e

这将提取自http://test.com/All/0.html内容，直到http://test.com/All/1000.html和每个URL的内容保存到自己的文件。磁盘上的文件名称将是URL中的文件名 - 0.html至1000.html

根据您尝试从中删除的网站的性能，可能会在运行脚本时经历相当长的时间延迟。如果性能很重要，可以考虑使用grequests

来源

2016-04-15 20:30:05 LearnerEarner

感谢LearnerEarner工作很好，但它从1000倒数而不是向前倒数 –

傻我不倒数它选择范围内的最后一个元素我需要它步骤 –

@SimonBrown我没有完全得到你的第二个评论... – LearnerEarner

无法用python编辑URL

回答

相关问题