2015-11-08 84 views
2

我无法过渡到data.table。我试图按某些分类变量进行分组,并应用各自针对不同变量以创建新列的函数列表。这似乎是应该很容易与mapplyMap,但我不能想出组装正确的子集传递给函数。通过不同变量的不同功能创建多个列

这里是什么样子,

set.seed(2015) 
dat <- data.table(cat1 = factor('Total'), 
        cat2 = factor(rep(letters[1:4], 5)), 
        cat3 = factor(rep(1:4, each=5)), 
        var1 = sample(20), 
        var2 = sample(20), 
        var3 = sample(20)) 

## I have list of factor columns to group by 
groups <- c(paste0("cat", 1:3)) 
setkeyv(dat, groups) 

## List of functions, and corresponding list of column names that 
## they are to be applied to. So, in this example I should get 
## two new columns: V1=sum(var1) and V2=mean(var2, var3) 
thing <- function(...) mean(c(...), na.rm=TRUE) # arbitrary function 
funs <- list("sum", "thing")      # named functions 
targets <- list("var1", c("var2", "var3"))  # variables 
outnames <- funs         # names or result columns 

## Can't get this part 
f <- function(fn, vars) do.call(fn, vars) 
dat[, outnames := Map(f, funs, targets), by=groups] 

结果这个例子应该是这样的

dat[, `:=`(sum=sum(var1), thing=thing(var2, var3)), by=groups] 

回答

3

我们需要子集在“目标基础上,列名的数据集列'list。一种方法是循环访问'targets'的list元素和data.table(.SD[, x, with=FALSE])子集,然后应用该函数。

dat[, unlist(outnames) := Map(f, funs, lapply(targets, function(x) 
          .SD[, x, with=FALSE])), by = groups] 
+1

很好,看起来不错,我可能会有一个后续问题,因为我也需要通过索引来同时进行子集。谢谢! – jenesaisquoi