通过发布javascript在网页中抓取R：

在this网站上，我想在顶部搜索框中输入代码“539300”，并从页面中获取结果（仅为new url）或某些内容（通过使用Xpath）。通过发布javascript在网页中抓取R：

library(rvest); library(httr); library(RCurl) 

url <- "http://www.moneycontrol.com" 


res <- POST(url, body = list(search_str = "539300"), encode = "form") 

pg <- read_html(content(res, as="text", encoding="UTF-8")) 

html_node(pg, xpath = '//*[@id="nChrtPrc"]/div[3]/h1')

这将导致一个错误

{xml_missing} 
<NA>

来源

2017-08-17 Vasim

您能否提供您正在尝试刮取的表的快照？ –

嗨，在[新的URL链接]（http://www.moneycontrol.com/india/stockpricequote/miscellaneous/akspintex/AKS01）< - 我想'// * [@ id =“nChrtPrc”]/div [ 3]/h1' - “AK Spintex”的路径。 – Vasim

_“禁止以任何形式或媒介复制新闻报道，照片，视频或任何其他内容，无需明确书面许可。”＃ – hrbrmstr

或者只是使用RCurl和XML库。

library(RCurl) 
library(XML) 

url <- "http://www.moneycontrol.com/india/stockpricequote/miscellaneous/akspintex/AKS01" 
curl <- getCurlHandle() 
html <- getURL(url,curl=curl, .opts = list(ssl.verifypeer = FALSE),followlocation=TRUE) 
doc <- htmlParse(html, encoding = "UTF-8") 
h1 <-xpathSApply(doc, "//*[@id='nChrtPrc']/div[3]/h1//text()") 
print(h1)

来源

2017-08-17 12:00:10

通过发布javascript在网页中抓取R：

回答

相关问题