2013-03-08 86 views
0

我有length(Date_List)我有信息的天数length(ISIN_Table$ID)项目。 对于每一天(j中的循环),我创建一个可容纳所有项目(length(ISIN_Table$ID))和一些列(4)的零数据框。缓慢data.frame填充

每个项目在每个矩阵中都是一排,但根据日期会有不同的填充。

#create list that will hold matrices 
df.list<-vector("list", length(Dates_List)) 
for (j in 1:(length(Dates_List))){ 
    df.list[[j]] <- data.frame(matrix(0, nrow = length(ISIN_Table$ID),ncol=4)) 
} 

#Loop over number of days 
for (j in 1:(length(Dates_List))){ 
    date<-Dates_List[j] 
    #create empty dataframe 
    df.list[[j]] <- data.frame(matrix(0, nrow=length(ISIN_Table$ID), ncol=4)) 

    #loop over every item 
    for (i in 1:(length(ISIN_Table$ID))){ 
    #check whether item is known at date 
    if (nrow(data.raw[data.raw$ID==i & data.raw$Date==date,]) < 1){ 
     ID<-i 
     df.list[[j]][i,1]<-date 
     df.list[[j]][i,2]<-ID  #fill up the row 
    } 
    else{ 
     #fill up the row 
     df.list[[j]][i,]<-c(
     as.character(data.raw[data.raw$ID==i & data.raw$Date==date,"Date"]), 
     (data.raw[data.raw$ID==i & data.raw$Date==date,"ID"]), 
     (data.raw[data.raw$ID==i & data.raw$Date==date,"Bid.Price"]), 
     (data.raw[data.raw$ID==i & data.raw$Date==date,"Ask.Price"])) 
    } 
    } 
} 

该代码给了我想要的确切输出,但它令人难以置信的速度缓慢。我将不胜感激关于如何提高速度的任何意见,目前的版本是行不通的。

UPDATE:

# create dummy data: 

Dates_List<-c("2007-01-02", "2007-01-03") 
ISIN_Table<-data.frame(c(1,2,3)) 
colnames(ISIN_Table)<-"ID" 
ID<-rep(1:2, len=2, each=1) 
Date<-c("2007-01-02","2007-01-02","2007-01-03", "2007-01-03") 
Bid.Price<-rep(100,4) 
Ask.Price<-rep(100,4) 
data.raw<-data.frame(ID, Date, Bid.Price, Ask.Price) 

问计df.list [[1]]返回:

  X1 X2 X3 X4 
1 2007-01-02 1 100 100 
2 2007-01-02 2 100 100 
3 2007-01-02 3 0 0 
+0

for R中的循环很慢。你可以尝试'应用'家庭功能。也没有可重复的数据,很难回答这样的问题。 – 2013-03-08 14:46:01

+0

看起来像你只是想分割data.raw的日期,如果你没有任何特定的'ID'为任何特定的日期,你正在用0 – 2013-03-08 14:52:52

+6

'for'循环并不慢。创建和子集数据框很慢。 – Roland 2013-03-08 14:53:22

回答

1

UPDATE 按@ Arun的建议,你可以拆分前添加缺少的行完全避免适应症

Dates_List <- c("2007-01-02", "2007-01-03") 
ISIN_Table <- data.frame(c(1, 2, 3)) 
colnames(ISIN_Table) <- "ID" 
ID <- rep(1:2, len = 2, each = 1) 
Date <- c("2007-01-02", "2007-01-02", "2007-01-03", "2007-01-03") 
Bid.Price <- rep(100, 4) 
Ask.Price <- rep(100, 4) 
data.raw <- data.frame(ID, Date, Bid.Price, Ask.Price) 

temp <- expand.grid(Dates_List, ISIN_Table$ID) 
names(temp) <- c("Date", "ID") 

data.raw <- merge(temp, data.raw, all.x = TRUE) 
data.raw[is.na(data.raw)] <- 0 
data.raw 
##   Date ID Bid.Price Ask.Price 
## 1 2007-01-02 1  100  100 
## 2 2007-01-02 2  100  100 
## 3 2007-01-02 3   0   0 
## 4 2007-01-03 1  100  100 
## 5 2007-01-03 2  100  100 
## 6 2007-01-03 3   0   0 


splitdata <- split(data.raw, data.raw$Date) 

splitdata 
## $`2007-01-02` 
##   Date ID Bid.Price Ask.Price 
## 1 2007-01-02 1  100  100 
## 2 2007-01-02 2  100  100 
## 3 2007-01-02 3   0   0 
## 
## $`2007-01-03` 
##   Date ID Bid.Price Ask.Price 
## 4 2007-01-03 1  100  100 
## 5 2007-01-03 2  100  100 
## 6 2007-01-03 3   0   0 

OLD ANSWER

您可以使用split分裂按日期,然后俏皮使用mapplymerge数据得到行甚至不具备在指定日期的任何数据的ID。

Dates_List <- c("2007-01-02", "2007-01-03") 
ISIN_Table <- data.frame(c(1, 2, 3)) 
colnames(ISIN_Table) <- "ID" 
ID <- rep(1:2, len = 2, each = 1) 
Date <- c("2007-01-02", "2007-01-02", "2007-01-03", "2007-01-03") 
Bid.Price <- rep(100, 4) 
Ask.Price <- rep(100, 4) 
data.raw <- data.frame(ID, Date, Bid.Price, Ask.Price) 

splitdata <- split(data.raw, data.raw$Date) 

mapply(FUN = function(x, date) merge(x, 
          data.frame(ID = ISIN_Table$ID, 
            Date = rep(date, length(ISIN_Table$ID))), 
           all.y = TRUE), 
     splitdata, t(names(splitdata)), SIMPLIFY = FALSE) 

## $`2007-01-02` 
## ID  Date Bid.Price Ask.Price 
## 1 1 2007-01-02  100  100 
## 2 2 2007-01-02  100  100 
## 3 3 2007-01-02  NA  NA 
## 
## $`2007-01-03` 
## ID  Date Bid.Price Ask.Price 
## 1 1 2007-01-03  100  100 
## 2 2 2007-01-03  100  100 
## 3 3 2007-01-03  NA  NA 
+0

(+1)非常好的使用'expand.grid'和'merge'! – Arun 2013-03-08 17:10:46