2017-04-26 64 views
1

我想用零替换特定月份后的每月值。我尝试过修改Replace NA values in dataframe starting in varying columns但没有成功。给定数据:R - 按行逐列替换从所选列开始的值

df <- structure(list(Mth1 = c(1L, 3L, 4L, 1L, 2L), 
         Mth2 = c(2L, 3L, 2L, 2L, 2L), 
         Mth3 = c(1L, 2L, 1L, 2L, 3L), 
         Mth4 = c(3L, 1L, 3L, 4L, 2L), 
         ZeroMth = c(1L, 3L, 2L, 4L, 3L)), 
       .Names = c("Mth1", "Mth2", "Mth3", "Mth4", "ZeroMth"), class = "data.frame", 
       row.names = c("1", "2", "3", "4", "5")) 


> df 
    Mth1 Mth2 Mth3 Mth4 ZeroMth 
1 1 2 1 3  1 
2 3 3 2 1  3 
3 4 2 1 3  2 
4 1 2 2 4  4 
5 2 2 3 2  3 

我想使用ZeroMth列中的值来指定替换开始的月份。所需的输出是:

> df1 
    Mth1 Mth2 Mth3 Mth4 
1 0 0 0 0 
2 3 3 0 0 
3 4 0 0 0 
4 1 2 2 0 
5 2 2 0 0 

回答

2

在每一行(MARGIN = 1)和replace使用apply在最后一列指定索引后的值为零

t(apply(X = df, MARGIN = 1, function(x) 
    replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0))) 
# Mth1 Mth2 Mth3 Mth4 ZeroMth 
#1 0 0 0 0  1 
#2 3 3 0 0  3 
#3 4 0 0 0  2 
#4 1 2 2 0  4 
#5 2 2 0 0  3 
+1

感谢#d.b使这个紧凑。我花了数小时试图在几秒钟内获得您提供的解决方案! –

2

您也可以使用lapply这样

setNames(data.frame(lapply(head(seq_along(df), -1), function(i) df[, i] * (i < df$ZeroMth))), 
     head(names(df), -1)) 

which returns

Mth1 Mth2 Mth3 Mth4 
1 0 0 0 0 
2 3 3 0 0 
3 4 0 0 0 
4 1 2 2 0 
5 2 2 0 0 

在这里,您将遍历月份矢量的位置,并检查月份中的元素是否小于指定的零月份。如果是,则返回该值,否则为0. setNames用于恢复变量名称。


一些基准

测试后,改变比2X加速更lapplysapply结果。主要的放缓是由于转换为data.frame。

这让我进一步检查了一下。这里是微基准结果。

microbenchmark(
db.mat=t(apply(X = df, MARGIN = 1, function(x) 
     replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0))), 
db.df=data.frame(t(apply(X = df, MARGIN = 1, function(x) 
     replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0)))), 
lmo.list=setNames(lapply(head(seq_along(df), -1), 
        function(i) df[, i] * (i < df$ZeroMth)), 
        head(names(df), -1)), 
lmo.dfl=setNames(data.frame(lapply(head(seq_along(df), -1), 
         function(i) df[, i] * (i < df$ZeroMth))), 
       head(names(df), -1)), 
lmo.dfs=setNames(data.frame(sapply(head(seq_along(df), -1), 
          function(i) df[, i] * (i < df$ZeroMth))), 
       head(names(df), -1)), 
lmo.listAlt=setNames(lapply(head(seq_along(df), -1), 
        function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp}), 
        head(names(df), -1)), 
lmo.dflAlt=setNames(data.frame(lapply(head(seq_along(df), -1), 
         function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})), 
       head(names(df), -1)), 
lmo.dfsAlt=setNames(data.frame(sapply(head(seq_along(df), -1), 
          function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})), 
       head(names(df), -1))) 

Unit: microseconds 
     expr  min  lq  mean median  uq  max neval cld 
     df.mat 135.994 155.2380 161.2480 159.6570 166.785 196.436 100 b 
     db.df 225.231 236.9190 248.3295 246.0430 256.164 340.411 100 c 
    lmo.list 84.960 99.5005 105.8299 104.9175 110.905 156.806 100 a 
    lmo.dfl 439.057 459.1565 480.3425 476.5475 492.656 647.751 100 d 
    lmo.dfs 173.057 187.3120 217.2876 195.8650 202.850 2257.151 100 c 
lmo.listAlt 91.803 108.0535 114.6253 113.1860 118.602 185.602 100 ab 
    lmo.dflAlt 458.158 481.2520 521.6052 498.2155 516.462 2584.163 100 d 
    lmo.dfsAlt 181.610 198.4310 221.5613 204.2755 212.686 1611.395 100 c 

哇,lapplydata.frame是超级慢。

0

我们还可以通过

(col(df[-5]) <df$ZeroMth[row(df[-5])])*df[-5] 
# Mth1 Mth2 Mth3 Mth4 
#1 0 0 0 0 
#2 3 3 0 0 
#3 4 0 0 0 
#4 1 2 2 0 
#5 2 2 0 0