2017-07-19 242 views
0

我想使用R从url抓取新闻(http://www.foxnews.com/search-results/search?q=“AlphaGo”& ss = fn & start = 0)。这里是我的代码:从“angular.callbacks”网页抓取数据

url <- "http://api.foxnews.com/v1/content/search?q=%22AlphaGo%22&fields=date,description,title,url,image,type,taxonomy&section.path=fnc&start=0&callback=angular.callbacks._0&cb=2017719162" 
html <- str_c(readLines(url,encoding = "UTF-8"),collapse = "") 
content_fox <- RJSONIO:: fromJSON(html) 

然而,JSON不能被理解为错误出现了:

错误文件(CON, “R”):无法打开连接

我注意到json从angular.callbacks._0开始,我认为这可能是问题所在。

任何想法如何解决这个问题?

回答

0

根据Parse JSONP with R答案,我ajusted我的代码有两个新的和它的工作:

url <- "http://api.foxnews.com/v1/content/search?q=%22AlphaGo%22&fields=date,description,title,url,image,type,taxonomy&section.path=fnc&start=0&callback=angular.callbacks._0&cb=2017719162" 
html <- str_c(readLines(url,encoding = "UTF-8"),collapse = "") 
html <- sub('[^\\{]*', '', html) # remove function name and opening parenthesis 
html <- sub('\\)$', '', html) # remove closing parenthesis 
content_fox <- RJSONIO:: fromJSON(html)