这可能很基本，但我一直在试图弄清楚几天，并没有找到答案。基于按类别分组的多个列的用户自定义功能

我正在尝试根据“集水区”分组的两列“浓度”和“面积”来计算新的数量。我已经写了一个函数来计算每一行的浓度差异，以及按该流域面积比例归一化的最大面积的行，但它不适用于dplyr或aggregate（。然后返回一个列表

理想情况下，我想将列添加到数据帧或完全替代浓缩塔下面是数据框“利”：。

area catchment concentration 
1 1  Yup  2.00000 
2 10  Yup  40.50000 
3 25  Yup  50.82031 
4 35  Yup  50.00000 
5 1  Nope  1.00000 
6 10  Nope  5.00000 
7 25  Nope  40.08333 
8 35  Nope  38.00000

下面是函数：

lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){ 
N= which.max(y) 
L = (x - x[N]) * y/max(y) 
return(L)}

这里是理想的结果：

area catchment concentration leverage 
1 1  Yup  2.00000 -1.3714286 
2 10  Yup  40.50000 -2.7142857 
3 25  Yup  50.82031 0.5859375 
4 35  Yup  50.00000 0.0000000 
5 1  Nope  1.00000 -1.0571429 
6 10  Nope  5.00000 -9.4285714 
7 25  Nope  40.08333 1.4880952 
8 35  Nope  38.00000 0.0000000

使用by，我可以得到两份清单的结果对每个流域：

by(lev, lev$catchment, lever)

，但我想使用受到几个因素的分类多列的功能（例如，迄今除集水区），我得到

错误 '维度的数目不正确' 与doBy和dplyr。

来源

2017-02-22 benjabiker

如果您提供[可重现的示例]，我们可以给出更好的答案（http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451#5965451）。 –

感谢您对其进行编辑以使其具有重现性。下次我会做得更好:) – benjabiker

加载数据：

lev <- read.table(text = "area catchment concentration 
    1  Yup  2.00000 
    10  Yup  40.50000 
    25  Yup  50.82031 
    35  Yup  50.00000 
    1  Nope  1.00000 
    10  Nope  5.00000 
    25  Nope  40.08333 
    35  Nope  38.00000", 
    header=TRUE)

分组由集水

library(dplyr) 
lev %>% 
    group_by(catchment) %>% 
    mutate(N = which.max(area), 
      L = (concentration - concentration[N]) * area/max(area)) 

# 
# area catchment concentration  N   L 
# <int> <fctr>   <dbl> <int>  <dbl> 
# 1  1  Yup  2.00000  4 -1.3714286 
# 2 10  Yup  40.50000  4 -2.7142857 
# 3 25  Yup  50.82031  4 0.5859357 
# 4 35  Yup  50.00000  4 0.0000000 
# 5  1  Nope  1.00000  4 -1.0571429 
# 6 10  Nope  5.00000  4 -9.4285714 
# 7 25  Nope  40.08333  4 1.4880929 
# 8 35  Nope  38.00000  4 0.0000000

使用你的函数

我修改您的函数，以便它返回一个数据帧。

lever2 <- function(data, 
        x = data[,"concentration"][[1]], 
        y = data[,"area"][[1]]){ 
    # Use [[1]] to extract the vector only 
    N <- which.max(y) 
    L <- (x - x[N]) * y/max(y) 
    # Put L back into the data frame 
    # so that we keep the concentration and area in the result 
    data$L <- L 
    return(data) 
    }

的funtion可以再用dplyr::group_by %>% do

lev %>% 
    group_by(catchment) %>% 
    do(lever2(.))

来源

2017-02-22 15:36:50

是的，我在笔记本上同时写字，但我比你慢。我虽然关于使用OP的函数'lever'和'group_by'％>％'do'机制来添加一个例子，但不知怎的，这返回'（列表）对象不能被强制键入'double''我仍然需要弄清楚如何使这一个工作。 –

完美的作品！如果我有多个列（例如，浓度1，浓度2），我怎样才能将L添加到每个数据框？ – benjabiker

编辑'mutate'指令'L =（concentration2 - concentration2 [N]）* area/max（area）'。但是，如果您有宽泛的数据结构，您可以考虑使用[tidyr :: gather]将数据帧重新整形为长格式（ftp://cran.r-project.org/pub/R/web/packages/tidyr/ vignettes/tidy-data.html）在执行'mutate'之前。 –

我们可以使用tidyverse

library(tidyverse) 
df1 %>% 
    group_by(catchment) %>% 
    mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))

基于该描述中，如果有多个列作为分组变量，将那些在group_by，计算也可以被应用到多个列与mutate_each

来源

2017-02-22 15:32:03 akrun

使用您还可以使用data.table计算此值：

library(data.table) 
# convert to data.table 
setDT(df) 

df[, leverage := (concentration - concentration[which.max(area)]) * (area/max(area)), 
    by=catchment] 
df 
    area catchment concentration leverage 
1: 1  Yup  2.00000 -1.3714286 
2: 10  Yup  40.50000 -2.7142857 
3: 25  Yup  50.82031 0.5859357 
4: 35  Yup  50.00000 0.0000000 
5: 1  Nope  1.00000 -1.0571429 
6: 10  Nope  5.00000 -9.4285714 
7: 25  Nope  40.08333 1.4880929 
8: 35  Nope  38.00000 0.0000000

数据

df <- 
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L), 
    catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope", 
    "Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031, 
    50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment", 
"concentration"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8"))

来源

2017-02-22 15:39:54 lmo

基于按类别分组的多个列的用户自定义功能

回答

使用你的函数

相关问题