2017-06-21 73 views
1

使用RSelenium我一直在试图下载来自特定网站上的Excel文件(.xls的)。我在下面粘贴我的整个R代码(在建立码头集装箱后)。无法下载xls文件,同时通过搬运工

ePrefs = makeFirefoxProfile(
list(
browser.download.dir = "/home/seluser/Downloads", 
"browser.download.folderList" = 2L, 
"browser.download.manager.showWhenStarting" = FALSE, 
"browser.helperApps.neverAsk.saveToDisk" = "application/vnd.ms-excel, 
    application/xls, application/x-xls, application/vnd-xls, 
    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" 
)) 

    remDr = remoteDriver(extraCapabilities = ePrefs, port = 4445) 
    remDr$open() 
    remDr$navigate("https://www.aeaweb.org/joe/listings?") 

    webelem1 = remDr$findElement(using = 'id', "published-date") 
    webelem1$clickElement() 

    webelem2 = remDr$findElement("css", "[value = 'week']") 
    webelem2$clickElement() 

    webelem3 = remDr$findElement("css", "[value = 'Apply Filter']") 
    webelem3$clickElement() 
    Sys.sleep(10) 

    webelem4 = remDr$findElement("css", "[feature = 'download']") 
    webelem4$clickElement() 

webelem5 = remDr$findElement("xpath", 
"/html/body/main/div/section/div/div[2]/div[2]/div/ul/li[3]/a") 
webelem5$clickElement() 

一切正常,但在最后一步(点击)硒浏览器仍然打开了平时的对话窗口,询问我是否要保存文件或打开它,即使我有压倒一切的命令代码的eprefs位。

我手工下载,最后点击应该直接下载文件并验证内容类型是 应用程序/ vnd.ms - Excel中。有什么我做错了吗?任何帮助表示赞赏。

+0

当您启动Docker容器时,是否在HOST和容器之间映射了下载位置?看到https://stackoverflow.com/questions/42293193/rselenium-on-docker-where-are-files-downloaded和https://stackoverflow.com/questions/42607389/download-file-with-rselenium-docker-toolbox – jdharrison

回答

0

的MIME类型服务器返回是application/force-download。将此添加到您的列表中,并观察映射的主机和容器下载位置,以下内容适用于我:

# initiate docker container mapping download locations 
# here HOST is linux 
# docker run -d -p 4445:4444 -p 5901:5900 -v /home/john/test:/home/seluser/Downloads selenium/standalone-firefox-debug:2.53.1 

library(RSelenium) 
ePrefs <- makeFirefoxProfile(
    list(
    browser.download.dir = "/home/seluser/Downloads", 
    "browser.download.folderList" = 2L, 
    "browser.download.manager.showWhenStarting" = FALSE, 
    "browser.helperApps.neverAsk.saveToDisk" = "application/vnd.ms-excel, 
    application/xls, application/x-xls, application/vnd-xls, 
    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, 
    application/force-download" 
)) 

remDr <- remoteDriver(extraCapabilities = ePrefs, port = 4445) 
remDr$open() 
remDr$navigate("https://www.aeaweb.org/joe/listings?") 

webelem1 <- remDr$findElement(using = 'id', "published-date") 
webelem1$clickElement() 

webelem2 <- remDr$findElement("css", "[value = 'week']") 
webelem2$clickElement() 

webelem3 <- remDr$findElement("css", "[value = 'Apply Filter']") 
webelem3$clickElement() 
Sys.sleep(10) 

webelem4 <- remDr$findElement("css", "[feature = 'download']") 
webelem4$clickElement() 

webelem5 = remDr$findElement("xpath", 
          "/html/body/main/div/section/div/div[2]/div[2]/div/ul/li[3]/a") 
webelem5$clickElement() 

list.files("/home/john/test/") 

> list.files("/home/john/test/") 
[1] "joe_resultset.xls"