2014-09-06 37 views
1

我有开始/结束多年的租约这样R回路基于起始端年

region=c("a","b","c","d") 
lease=c("x","y","z","k") 
startyr=c(2000,2001,2003,2002) 
endyr=c(2004,2004,2006,2005) 
annualAmt=c(7000,8500,6000,5500) 
df=data.frame(region,lease,startyr,endyr,annualAmt) 

我想通过整形来分散多年的数据帧来创建新的cols数据帧到这个所需的输出:

region lease 2000 2001 2002 ... 2006 
a x 7000 7000 7000 7000 7000 0 0 
b y 0 8500 8500 8500 8500 0 0 

的逻辑是,如果租期一年涵盖2000- 2004年,它的AMT将计算入2000,2001..2004山坳

什么是最好的方式做它? 如果我写一个循环,我应该如何命名新创建的年份cols 2000-2006? 或者我应该使用apply?

+0

你也可以发布你想要的输出吗?添加了 – A5C1D2H2I1M1N2O1R2T1 2014-09-06 04:22:44

+0

。谢谢。 @AnandaMahto – santoku 2014-09-06 04:30:06

回答

1

下面是主要涉及基本加减一种替代方案:

Rows <- df$endyr - df$startyr    # How many times to repeat rows? 
df <- df[rep(rownames(df), Rows), ]   # Repeat the rows 
df$year <- df$startyr + sequence(Rows) - 1 # Add a new "year" variable 
reshape(df, direction = "wide",    # Reshape, long to wide 
     idvar = c("region", "lease"),  # idvars are the first two cols 
     timevar = "year",     # timevar is the new year col 
     drop = c("startyr", "endyr"))  # and drop the start/endyr cols 
# region lease annualAmt.2000 annualAmt.2001 annualAmt.2002 
# 1  a  x   7000   7000   7000 
# 2  b  y    NA   8500   8500 
# 3  c  z    NA    NA    NA 
# 4  d  k    NA    NA   5500 
# annualAmt.2003 annualAmt.2004 annualAmt.2005 
# 1   7000    NA    NA 
# 2   8500    NA    NA 
# 3   6000   6000   6000 
# 4   5500   5500    NA 

或者,您可以使用“数据.table“,就像这样:

library(data.table) 
## Start with your original df 
dt <- data.table(df) 
dcast.data.table(
    DT[, list(year = seq(startyr, endyr), 
      annualAmt), 
    by = list(region, lease)], 
    region + lease ~ year, 
    value.var = "annualAmt", fill = 0) 
# region lease 2000 2001 2002 2003 2004 2005 2006 
# 1:  a  x 7000 7000 7000 7000 7000 0 0 
# 2:  b  y 0 8500 8500 8500 8500 0 0 
# 3:  c  z 0 0 0 6000 6000 6000 6000 
# 4:  d  k 0 0 5500 5500 5500 5500 0 
0

如何

years <- seq(min(df$startyr), max(df$endyr)) 

dd <- data.frame(region, lease, t(mapply(function(a,b, v) { 
    v* !is.na(match(years, seq(a, b))) 
}, startyr, endyr, annualAmt))) 

names(dd)[-(1:2)]<-years 
dd 

返回

region lease 2000 2001 2002 2003 2004 2005 2006 
1  a  x 7000 7000 7000 7000 7000 0 0 
2  b  y 0 8500 8500 8500 8500 0 0 
3  c  z 0 0 0 6000 6000 6000 6000 
4  d  k 0 0 5500 5500 5500 5500 0 
+0

它非常整齐。不过,我不明白为什么输入输出是转置的 – santoku 2014-09-06 07:12:15