OSError：[Errno 22}无效参数：'downloaded/misc/jquery.js？v = 1.4.4'

tfp = open(filename, 'wb')OSError：[Errno 22}无效参数：'downloaded/misc/jquery.js？v = 1.4.4'

OSError: [Errno 22} Invalid argument: 'downloaded/misc/jquery.js?v=1.4.4'

任何人都可以帮我解决这个错误吗？我认为它与jquery.js?v=1.4.4无效。我是python的新手;我很抱歉，如果我失去了一些明显的东西。

下面是代码：

import os 
from urllib.request import urlretrieve 
from urllib.request import urlopen 
from bs4 import BeautifulSoup 

downloadDirectory = "downloaded" 
baseUrl = "http://pythonscraping.com" 

def getAbsoluteURL(baseUrl, source): 
    if source.startswith("http://www."): 
     url = "http://"+source[11:] 
    elif source.startswith("http://"): 
     url = source 
    elif source.startswith("www."): 
     url = source[4:] 
     url = "http://"+source 
    else: 
     url = baseUrl+"/"+source 
    if baseUrl not in url: 
     return None 
    return url 

def getDownloadPath(baseUrl, absoluteUrl, downloadDirectory): 
    path = absoluteUrl.replace("www.", "") 
    path = path.replace(baseUrl, "") 
    path = downloadDirectory+path 
    directory = os.path.dirname(path) 

    if not os.path.exists(directory): 
     os.makedirs(directory) 

    return path 

html = urlopen("http://www.pythonscraping.com") 
bsObj = BeautifulSoup(html, "html.parser") 
downloadList = bsObj.findAll(src=True) 

for download in downloadList: 
    fileUrl = getAbsoluteURL(baseUrl, download["src"]) 
    if fileUrl is not None: 
     print(fileUrl) 
     urlretrieve(fileUrl, getDownloadPath(baseUrl, fileUrl, downloadDirectory))

来源

2017-02-19 Arton Eel

它不是下载一个有效的文件，也许这是不正确的链接，下载文件。 – Arman

是的，这是有道理的。谢谢。 –

对于功能urlretrieve(url, filename, reporthook, data)，你给了filename参数必须是你的操作系统有效的文件名参数。

在这种情况下，当您运行

urlretrieve(fileUrl, getDownloadPath(baseUrl, fileUrl, downloadDirectory))

您为url的说法是 “http://pythonscraping.com/misc/jquery.js?v=1.4.4”，你给了filename的说法是“下载/其它/ jquery.js和？V = 1.4。 4" 。

“jquery.js？v = 1.4.4”我认为这不是一个有效的文件名。

解决方法：在getDownloadPath功能，改变return path到

return path.partition('?')[0]

来源

2017-07-15 12:17:19

下载/其它/ jquery.js和V = 1.4.4不是一个有效的文件名？我觉得这样更好的解决方案：

import requests 
from bs4 import BeautifulSoup 

download_directory = "downloaded" 
base_url = "http://www.pythonscraping.com/" 
# Use Requests instead urllib 
def get_files_url(base_url): 
    # Return a list of tag elements that contain src attrs 
    html = requests.get(base_url) 
    soup = BeautifulSoup(html.text, "lxml") 
    return soup.find_all(src=True) 

def get_file_name(url): 
    # Return the last part after the last "/" as file name 
    # Eg: return a.png as file name if url=http://pythonscraping.com/a.png 
    # Remove characters not valid in file name 
    file_name = url.split("/")[-1] 
    remove_list = "?><\/:\"*|" 
    for ch in remove_list: 
     if ch in file_name: 
      file_name = file_name.replace(ch, "") 
    return download_directory + "/" + file_name 

def get_formatted_url(url): 
    if not url.startswith("http://"): 
     return base_url + url 
    elif base_url not in url: 
     return None 
    else: 
     return url 

links = get_files_url(base_url) 

for link in links: 
    url = link["src"] 
    url = get_formatted_url(url) 
    if url is None: 
     continue 
    print(url) 
    result = requests.get(url, stream=True) 
    file_name = get_file_name(url) 
    print(file_name) 
    with open(file_name, 'wb') as f: 
     for chunk in result.iter_content(10): 
      f.write(chunk)

来源

2017-12-19 07:27:23 Kernel

OSError：[Errno 22}无效参数：'downloaded/misc/jquery.js？v = 1.4.4'

回答

相关问题