2016-08-04 106 views
3

我有这样通过基于R中的键减去值来创建新列?

ID  DAYS FREQUENCY 
"ads" 20  3 
"jwa" 45  2 
"mno" 4  1 
"ads" 13  3 
"jwa" 60  2 
"ads" 18  3 

数据表我想补充一点,减去根据id的日子一列,减去最接近在一起的日子。 我的新表想是这样的:

ID  DAYS FREQUENCY DAYS DIFF 
"ads" 20  3   2 (because 20-18) 
"jwa" 45  2   NA (because no value greater than 45 for that id) 
"mno" 4  1   NA 
"ads" 13  3   NA 
"jwa" 60  2   15 
"ads" 18  3   5 

奖励:有没有使用合并功能的方法吗?

+0

为什么你想/希望在这里使用合并? Fwiw,如果你愿意安装一个软件包,可以使用'library(data.table); setDT(DF)[order(DAYS),dd:= DAYS - shift(DAYS),by = ID]' – Frank

回答

1

下面是一个使用dplyr答案:

require(dplyr) 
mydata %>% 
    mutate(row.order = row_number()) %>% # row numbers added to preserve original row order 
    group_by(ID) %>% 
    arrange(DAYS) %>% 
    mutate(lag = lag(DAYS)) %>% 
    mutate(days.diff = DAYS - lag) %>% 
    ungroup() %>% 
    arrange(row.order) %>% 
    select(ID, DAYS, FREQUENCY, days.diff) 

输出:

 ID DAYS FREQUENCY days.diff 
    <fctr> <int>  <int>  <int> 
1 ads 20   3   2 
2 jwa 45   2  NA 
3 mno  4   1  NA 
4 ads 13   3  NA 
5 jwa 60   2  15 
6 ads 18   3   5 
+0

您不需要连续进行两个mutate调用。 mutate(x = g(z),y = f(x))'是可行的。 – Frank

+1

谢谢@Frank,学到了新的东西! –

0

你可以做到这一点使用dplyr和快速循环:

library(dplyr) 

# Rowwise data.frame creation because I'm too lazy not to copy-paste the example data 
df <- tibble::frame_data(
    ~ID, ~DAYS, ~FREQUENCY, 
    "ads", 20,  3, 
    "jwa", 45,  2, 
    "mno", 4,  1, 
    "ads", 13,  3, 
    "jwa", 60,  2, 
    "ads", 18,  3 
) 

# Subtract each number in a numeric vector with the one following it 
rolling_subtraction <- function(x) { 
    out <- vector('numeric', length(x)) 
    for (i in seq_along(out)) { 
    out[[i]] <- x[i] - x[i + 1] # x[i + 1] is NA if the index is out of bounds 
    } 

    out 
} 

# Arrange data.frame in order of ID/Days and apply rolling subtraction 
df %>% 
    arrange(ID, desc(DAYS)) %>% 
    group_by(ID) %>% 
    mutate(days_diff = rolling_subtraction(DAYS))