2017-08-01 55 views
2

我正在寻找下载上市公司的基本数据。利用quantmod包,我试图使用getFinancials()来提取数据,它为一些公司工作,但有不同的结果(我阅读并理解有关免费数据的免责声明),但想确认我正确地拉这个数据。quantmod getFinancials()不拉财务

对于蒙托亚: 在雅虎金融网站,我看到填充财务,但下面的电话似乎拉"google"src,而不是"yahoo",对其中有人口稀疏的财务状况。

谷歌 - https://www.google.com/finance?q=NYSE%3AJPM&fstype=ii&ei=9kh-WejLE5e_etbzmpgP

雅虎 - https://finance.yahoo.com/quote/JPM/financials?p=JPM

library(quantmod) 
JPM <- getFinancials("JPM", src = "yahoo", auto.assign = FALSE) 
viewFin(JPM, type = "IS", period = "A") 

是否有指定src正确的方法是什么?还有一种方法可以使用getFinancials(),但如果在指示性列中有NA(例如收入)切换源(谷歌与雅虎)?

回答

2

getFinancials帮助页面的顶部说,(强调),

下载损益表,资产负债表,并从谷歌财经现金流量表。

目前没有办法指定雅虎财经作为来源。这样做需要有人撰写一种方法来从Yahoo Finance中分离和分析HTML,因为无法像价格数据那样将文件下载到文件中。

+0

谢谢!猜猜下次我应该仔细阅读哈 – steich

0

我想雅虎最近改变了它的API。下载从标题的链接文件“获取Excel电子表格,从谷歌财经下载批量历史股票数据”

http://investexcel.net/multiple-stock-quote-downloader-for-excel/

enter image description here

这是对Excel中,您可以轻松地加载到R.

你也可以尝试这样的事情。

# assumes codes are known beforehand 
codes <- c("MSFT","SBUX","S","AAPL","ADT") 
urls <- paste0("https://www.google.com/finance/historical?q=",codes,"&output=csv") 
paths <- paste0(codes,"csv") 
missing <- !(paths %in% dir(".", full.name = TRUE)) 
missing 

# simple error handling in case file doesn't exists 
downloadFile <- function(url, path, ...) { 
# remove file if exists already 
if(file.exists(path)) file.remove(path) 
# download file 
tryCatch(
download.file(url, path, ...), error = function(c) { 
# remove file if error 
if(file.exists(path)) file.remove(path) 
# create error message 
c$message <- paste(substr(path, 1, 4),"failed") 
message(c$message) 
} 
) 
} 
# wrapper of mapply 
Map(downloadFile, urls[missing], paths[missing]) 

或者,这个。

## downloads historic prices for all constituents of SP500 
library(zoo) 
library(tseries)       

## read in list of constituents, with company name in first column and 
## ticker symbol in second column 

## CREATE A FILE TO READ DATA FROM!!! 
spComp <- read.csv("C:/Users/Excel/Desktop/stocks.csv") 

## specify time period 
dateStart <- "2013-01-01"    
dateEnd <- "2015-05-08" 

## extract symbols and number of iterations 
symbols <- spComp[, 1] 
nAss <- length(symbols) 

## download data on first stock as zoo object 
z <- get.hist.quote(instrument = symbols[1], start = dateStart, 
        end = dateEnd, quote = "AdjClose", 
        retclass = "zoo", quiet = T) 

## use ticker symbol as column name 
dimnames(z)[[2]] <- as.character(symbols[1]) 

## download remaining assets in for loop 
for (i in 2:nAss) { 
    ## display progress by showing the current iteration step 
    cat("Downloading ", i, " out of ", nAss , "\n") 

    result <- try(x <- get.hist.quote(instrument = symbols[i], 
            start = dateStart, 
            end = dateEnd, quote = "AdjClose", 
            retclass = "zoo", quiet = T)) 
    if(class(result) == "try-error") { 
     next 
    } 
    else { 
     dimnames(x)[[2]] <- as.character(symbols[i]) 

     ## merge with already downloaded data to get assets on same dates 
     z <- merge(z, x)      

    } 


} 

## save data 
# CREATE A FILE TO WRITE DATA TO!!! 
write.zoo(z, file = "C:/Users/Excel/Desktop/all_sp500_price_data.csv", index.name = "time") 

这是另一个选项供您考虑。

Method #1: 
--- 
layout: post 
title: "2014-11-20-Download-Stock-Data-1" 
description: "" 
category: R 
tags: [knitr,lubridate,stringr,plyr,dplyr] 
--- 
{% include JB/setup %} 

This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found [here](https://github.com/hadley/cran-logs-dplyr/blob/master/1-download.r). 

First of all, the following three packages are used. 


{% highlight r %} 
library(knitr) 
library(lubridate) 
library(stringr) 
library(plyr) 
library(dplyr) 
{% endhighlight %} 

The script begins with creating a folder to save data files. 


{% highlight r %} 
# create data folder 
dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1") 
if(file.exists(dataDir)) { 
     unlink(dataDir, recursive = TRUE) 
     dir.create(dataDir) 
} else { 
     dir.create(dataDir) 
} 
{% endhighlight %} 

After creating urls and file paths, files are downloaded using `Map` function - it is a warpper of `mapply`. Note that, in case the function breaks by an error (eg when a file doesn't exist), `download.file` is wrapped by another function that includes an error handler (`tryCatch`). 


{% highlight r %} 
# assumes codes are known beforehand 
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing 
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:", 
       codes,"&output=csv") 
paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\) 

# simple error handling in case file doesn't exists 
downloadFile <- function(url, path, ...) { 
     # remove file if exists already 
     if(file.exists(path)) file.remove(path) 
     # download file 
     tryCatch(   
      download.file(url, path, ...), error = function(c) { 
        # remove file if error 
        if(file.exists(path)) file.remove(path) 
        # create error message 
        c$message <- paste(substr(path, 1, 4),"failed") 
        message(c$message) 
      } 
    ) 
} 
# wrapper of mapply 
Map(downloadFile, urls, paths) 
{% endhighlight %} 


Finally files are read back using `llply` and they are combined using `rbind_all`. Note that, as the merged data has multiple stocks' records, `Code` column is created. 



{% highlight r %} 
# read all csv files and merge 
files <- dir(dataDir, full.name = TRUE) 
dataList <- llply(files, function(file){ 
     data <- read.csv(file, stringsAsFactors = FALSE) 
     # get code from file path 
     pattern <- "/[A-Z][A-Z][A-Z][A-Z]" 
     code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern))) 
     # first column's name is funny 
     names(data) <- c("Date","Open","High","Low","Close","Volume") 
     data$Date <- dmy(data$Date) 
     data$Open <- as.numeric(data$Open) 
     data$High <- as.numeric(data$High) 
     data$Low <- as.numeric(data$Low) 
     data$Close <- as.numeric(data$Close) 
     data$Volume <- as.integer(data$Volume) 
     data$Code <- code 
     data 
}, .progress = "text") 

data <- rbind_all(dataList) 
{% endhighlight %} 

Some of the values are shown below. 


|Date  | Open| High| Low| Close| Volume|Code | 
|:----------|-----:|-----:|-----:|-----:|--------:|:----| 
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT | 
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT | 
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT | 
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT | 
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT | 
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT | 

This way wouldn't be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly. 

I hope this article is useful and I'm going to write an article to show the second way. 

Method #2: 
--- 
layout: post 
title: "2014-11-20-Download-Stock-Data-2" 
description: "" 
category: R 
tags: [knitr,lubridate,stringr,plyr,dplyr] 
--- 
{% include JB/setup %} 

In an [earlier article](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn't be effective and, in this article, files are downloaded and merged internally. 

The following packages are used. 


{% highlight r %} 
library(knitr) 
library(lubridate) 
library(stringr) 
library(plyr) 
library(dplyr) 
{% endhighlight %} 

Taking urls as file locations, files are directly read using `llply` and they are combined using `rbind_all`. As the merged data has multiple stocks' records, `Code` column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end. 


{% highlight r %} 
# assumes codes are known beforehand 
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing 
files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:", 
       codes,"&output=csv") 

dataList <- llply(files, function(file, ...) { 
     # get code from file url 
     pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]" 
     code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern))) 

     # read data directly from a URL with only simple error handling 
     # for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html 
     tryCatch({ 
      data <- read.csv(file, stringsAsFactors = FALSE) 
      # first column's name is funny 
      names(data) <- c("Date","Open","High","Low","Close","Volume") 
      data$Date <- dmy(data$Date) 
      data$Open <- as.numeric(data$Open) 
      data$High <- as.numeric(data$High) 
      data$Low <- as.numeric(data$Low) 
      data$Close <- as.numeric(data$Close) 
      data$Volume <- as.integer(data$Volume) 
      data$Code <- code 
      data    
     }, 
     error = function(c) { 
      c$message <- paste(code,"failed") 
      message(c$message) 
      # return a dummy data frame 
      data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0, 
           Low=0, Close=0, Volume=0, Code="NA") 
      data 
     }) 
}) 

# dummy data frame values are filtered out 
data <- filter(rbind_all(dataList), Code != "NA") 
{% endhighlight %} 

Some of the values are shown below. 


|Date  | Open| High| Low| Close| Volume|Code | 
|:----------|-----:|-----:|-----:|-----:|--------:|:----| 
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT | 
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT | 
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT | 
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT | 
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT | 
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT | 

It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog. 

I hope this article is useful. 


Summarize Stock returns From Multiple Files: 
--- 
layout: post 
title: "2014-11-27-Summarise-Stock-Returns-from-Multiple-Files" 
description: "" 
category: R 
tags: [knitr,lubridate,stringr,reshape2,plyr,dplyr] 
--- 
{% include JB/setup %} 

This is a slight extension of the previous two articles ([2014-11-20-Download-Stock-Data-1](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), [2014-11-20-Download-Stock-Data-2](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-2/)) and it aims to produce gross returns, standard deviation and correlation of multiple shares. 

The following packages are used. 


{% highlight r %} 
library(knitr) 
library(lubridate) 
library(stringr) 
library(reshape2) 
library(plyr) 
library(dplyr) 
{% endhighlight %} 

The script begins with creating a data folder in the format of *data_YYYY-MM-DD*. 


{% highlight r %} 
# create data folder 
dataDir <- paste0("data","_",format(Sys.Date(),"%Y-%m-%d")) 
if(file.exists(dataDir)) { 
    unlink(dataDir, recursive = TRUE) 
    dir.create(dataDir) 
} else { 
    dir.create(dataDir) 
} 
{% endhighlight %} 

Given company codes, URLs and file paths are created. Then data files are downloaded by `Map`, which is a wrapper of `mapply`. Note that R's `download.file` function is wrapped by `downloadFile` so that the function does not break when an error occurs. 


{% highlight r %} 
# assumes codes are known beforehand 
codes <- c("MSFT", "TCHC") 
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:", 
       codes,"&output=csv") 
paths <- paste0(dataDir,"/",codes,".csv") # backward slash on windows (\) 

# simple error handling in case file doesn't exists 
downloadFile <- function(url, path, ...) { 
    # remove file if exists already 
    if(file.exists(path)) file.remove(path) 
    # download file 
    tryCatch(
    download.file(url, path, ...), error = function(c) { 
     # remove file if error 
     if(file.exists(path)) file.remove(path) 
     # create error message 
     c$message <- paste(substr(path, 1, 4),"failed") 
     message(c$message) 
    } 
) 
} 
# wrapper of mapply 
Map(downloadFile, urls, paths) 
{% endhighlight %} 

Once the files are downloaded, they are read back to combine using `rbind_all`. Some more details about this step is listed below. 

* only Date, Close and Code columns are taken 
* codes are extracted from file paths by matching a regular expression 
* data is arranged by date as the raw files are sorted in a descending order 
* error is handled by returning a dummy data frame where its code value is NA. 
* individual data files are merged in a long format 
    * 'NA' is filtered out 


{% highlight r %} 
# read all csv files and merge 
files <- dir(dataDir, full.name = TRUE) 
dataList <- llply(files, function(file){ 
    # get code from file path 
    pattern <- "/[A-Z][A-Z][A-Z][A-Z]" 
    code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern))) 
    tryCatch({ 
    data <- read.csv(file, stringsAsFactors = FALSE) 
    # first column's name is funny 
    names(data) <- c("Date","Open","High","Low","Close","Volume") 
    data$Date <- dmy(data$Date) 
    data$Close <- as.numeric(data$Close) 
    data$Code <- code 
    # optional 
    data$Open <- as.numeric(data$Open) 
    data$High <- as.numeric(data$High) 
    data$Low <- as.numeric(data$Low) 
    data$Volume <- as.integer(data$Volume) 
    # select only 'Date', 'Close' and 'Code' 
    # raw data should be arranged in an ascending order 
    arrange(subset(data, select = c(Date, Close, Code)), Date) 
    }, 
    error = function(c){ 
    c$message <- paste(code,"failed") 
    message(c$message) 
    # return a dummy data frame not to break function 
    data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Close=0, Code="NA") 
    data 
    }) 
}, .progress = "text") 

# data is combined to create a long format 
# dummy data frame values are filtered out 
data <- filter(rbind_all(dataList), Code != "NA") 
{% endhighlight %} 

Some values of this long format data is shown below. 


|Date  | Close|Code | 
|:----------|-----:|:----| 
|2013-11-29 | 38.13|MSFT | 
|2013-12-02 | 38.45|MSFT | 
|2013-12-03 | 38.31|MSFT | 
|2013-12-04 | 38.94|MSFT | 
|2013-12-05 | 38.00|MSFT | 
|2013-12-06 | 38.36|MSFT | 

The data is converted into a wide format data where the x and y variables are Date and Code respectively (`Date ~ Code`) while the value variable is Close (`value.var="Close"`). Some values of the wide format data is shown below. 


{% highlight r %} 
# data is converted into a wide format 
data <- dcast(data, Date ~ Code, value.var="Close") 
kable(head(data)) 
{% endhighlight %} 



|Date  | MSFT| TCHC| 
|:----------|-----:|-----:| 
|2013-11-29 | 38.13| 13.52| 
|2013-12-02 | 38.45| 13.81| 
|2013-12-03 | 38.31| 13.48| 
|2013-12-04 | 38.94| 13.71| 
|2013-12-05 | 38.00| 13.55| 
|2013-12-06 | 38.36| 13.95| 

The remaining steps are just differencing close price values after taking log and applying `sum`, `sd`, and `cor`. 


{% highlight r %} 
# select except for Date column 
data <- select(data, -Date) 

# apply log difference column wise 
dailyRet <- apply(log(data), 2, diff, lag=1) 

# obtain daily return, variance and correlation 
returns <- apply(dailyRet, 2, sum, na.rm = TRUE) 
std <- apply(dailyRet, 2, sd, na.rm = TRUE) 
correlation <- cor(dailyRet) 

returns 
{% endhighlight %} 



{% highlight text %} 
##  MSFT  TCHC 
## 0.2249777 0.6293973 
{% endhighlight %} 



{% highlight r %} 
std 
{% endhighlight %} 



{% highlight text %} 
##  MSFT  TCHC 
## 0.01167381 0.03203031 
{% endhighlight %} 



{% highlight r %} 
correlation 
{% endhighlight %} 



{% highlight text %} 
##   MSFT  TCHC 
## MSFT 1.0000000 0.1481043 
## TCHC 0.1481043 1.0000000 
{% endhighlight %} 

Finally the data folder is deleted. 


{% highlight r %} 
# delete data folder 
if(file.exists(dataDir)) { unlink(dataDir, recursive = TRUE) } 
{% endhighlight %}