使用urllib.urlretrieve通过HTTP下载文件无法正常工作

我仍在使用我的mp3下载器，但现在我遇到了正在下载的文件的问题。我有两个版本的部分让我绊倒。第一个给我一个正确的文件，但会导致错误。第二个给我一个文件太小，但没有错误。我试过以二进制模式打开文件，但没有帮助。我对使用html做任何工作都很陌生，所以任何帮助都很重要。使用urllib.urlretrieve通过HTTP下载文件无法正常工作

import urllib 
import urllib2 

def milk(): 
    SongList = [] 
    SongStrings = [] 
    SongNames = [] 
    earmilk = urllib.urlopen("http://www.earmilk.com/category/pop") 
    reader = earmilk.read() 
    #gets the position of the playlist 
    PlaylistPos = reader.find("var newPlaylistTracks = ") 
    #finds the number of songs in the playlist 
    NumberSongs = reader[reader.find("var newPlaylistIds = "): PlaylistPos].count(",") + 1 
    initPos = PlaylistPos 

    #goes though the playlist and records the html address and name of the song 

    for song in range(0, NumberSongs): 
     songPos = reader[initPos:].find("http:") + initPos 
     namePos = reader[songPos:].find("name") + songPos 
     namePos += reader[namePos:].find(">") 
     nameEndPos = reader[namePos:].find("<") + namePos 
     SongStrings.append(reader[songPos: reader[songPos:].find('"') + songPos]) 
     SongNames.append(reader[namePos + 1: nameEndPos]) 
     initPos = nameEndPos 

    for correction in range(0, NumberSongs): 
     SongStrings[correction] = SongStrings[correction].replace('\\/', "/") 

    #downloading songs 

    fileName = ''.join([a.isalnum() and a or '_' for a in SongNames[0]]) 
    fileName = fileName.replace("_", " ") + ".mp3" 


#   This version writes a file that can be played but gives an error saying: "TypeError: expected a character buffer object" 
## songDL = open(fileName, "wb") 
## songDL.write(urllib.urlretrieve(SongStrings[0], fileName)) 


#   This version creates the file but it cannot be played (file size is much smaller than it should be) 
## url = urllib.urlretrieve(SongStrings[0], fileName) 
## url = str(url) 
## songDL = open(fileName, "wb") 
## songDL.write(url) 


    songDL.close() 

    earmilk.close()

来源

2013-12-15 johnsona

重读the documentation for urllib.urlretrieve：

返回一个元组（文件名，标题）其中filename是本地文件名下该对象可以发现，和头是无论信息（）由urlopen（）返回的对象的方法返回（对于可能缓存的远程对象）。

您似乎期待它返回文件本身的字节。 urlretrieve这一点是它为你处理写入文件，并返回它写入的文件名（如果你提供了一个函数，它通常与你函数的第二个参数是一样的）。

来源

2013-12-15 21:10:08 Iguananaut

顺便说一下，这种事情是学习使用[pdb]（http://docs.python.org/2/library/pdb.html）的重要原因。在Python REPL中运行你的函数，当它崩溃时，输入'import pdb; pdb.pm（）'在代码崩溃时获得调试器提示符。从那里你可以直接查看像'urlretrieve'这样的函数实际上是否返回。这应该让你了解为什么你要用返回值做的各种事情都失败了。 – Iguananaut

使用urllib.urlretrieve通过HTTP下载文件无法正常工作

回答

相关问题