Python：从javascript获得下载链接按钮

我试图让我的脚本从www.subscene.com下载字幕。问题是，网页上的下载按钮是用java制作的，出于某种原因，即使我提取URL，我也无法下载字幕。Python：从javascript获得下载链接按钮

我认为这是对的下载按钮的代码：

<a id="s_lc_bcr_downloadLink" class="downloadLink rating0" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;s$lc$bcr$downloadLink&quot;, &quot;&quot;, true, &quot;&quot;, &quot;/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx&quot;, false, true))">Download English Subtitle</a><a id="s_lc_bcr_previewLink" href="javascript:togglePreview(482407, 'zip');">(See preview)</a>

，所以我提取的网址，并告诉我的脚本下载：

urllib.urlretrieve('http://subscene.com/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx','c:\\sub.zip')

（新增的“http：//子场景.com'）

但由于某种原因，它不下载正确的文件。我应该做些什么？

编辑：

非常感谢！遗憾的是我不能让它的工作:(它说以下

from selenium import webdriver 

browser = webdriver.Firefox() 
browser.execute_script('WebForm_DoPostBackWithOptions(newWebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))') 

Traceback (most recent call last): 
File "<pyshell#2>", line 1, in <module> 
browser.execute_script('WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))') 
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\webdriver.py", line 385, in execute_script{'script': script, 'args':converted_args})['value'] 
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\webdriver.py", line 153, in execute 
self.error_handler.check_response(response) 
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\errorhandler.py", line 126, in check_response 
raise exception_class(message, screen, stacktrace) 
WebDriverException: Message: ''

来源

2011-11-27 user1067911

你的努力下载（zip.zipx）不是文件，这就是一些JavaScript。我正在研究如何获得下载的网址。 –

这将很难找到每个文件的实际URL。它似乎一切都从服务器通过JavaScript检索。我认为这不是一个网址，也许是本地目录，你将不得不好好看看网站的JavaScript以及它如何处理这些文件。我注意到'http：//subscene.com/downloadissue.aspx？subtitleId = 482407＆contentType = zip'这一行很多，这意味着它找到了'subtitleId'，然后确保'zip'的'contentType'，并从那里抓取它。这可能是用SQL的一种形式组织的。 –

正如约翰说，这不是文件，但JavaScript代码。因此，而不是使用urllib.urlretrieve获得该文件的，你可以执行JavaScript会下载该文件反过来这可以使用硒模块完成 -

from selenium import webdriver 
browser = webdriver.Firefox() 
browser.get('http://subscene.com/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407.aspx')   
browser.execute_script('WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))') 
raw_input()

我用这个JavaScript片段萤火虫

来源

2011-11-27 21:11:43 theharshest

非常好的@theharshest我想你也可以用机械化的python库实现类似的结果 - 但这已经足够优雅了。但它不需要你也安装硒的Java服务器等？ – alonisser

@alonisser谢谢，是的你需要为Python安装硒模块。使用PIP下载模块非常简单。 – theharshest

很高兴有人能够帮助他+1。 –

Python：从javascript获得下载链接按钮

回答

相关问题