2017-09-01 56 views
0

我想创建一个新列,其中每个值都是我的数据中该行其他值的随机子集。创建新列是其他列的随机子集

# Example data: 
df <- data.frame(matrix(nrow = 57, ncol = 6)) %>% 
    mutate(
    X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1) 
) 

# my failed attempt at a new column 
df %>% 
    rowwise() %>% 
    mutate(X7 = str_c(df[, sample(1:6, 3, replace = F)]), sep = ", ") 
+0

忘记'rowwise'和使用'样品(1:6,1,替换= F) '。只有一列不是3。顺便说一句,为什么'str_c'?你不想用数字填充'X7'吗?像这样你会有角色。 –

+0

@RuiBarradas我希望X7的每个值都是来自其自己行的3个随机值的向量。 – Joe

回答

2

解决方案使用tidyverse。关键是按行分割数据并应用函数来为每个行子集采样值。 map_df可以实现上述任务并将所有的输出结合到一个数据帧中。 df2是最终的输出。

# Load package 
library(tidyverse) 

# Set seed 
set.seed(123) 

# Create example data frame 
df <- data.frame(matrix(nrow = 57, ncol = 6)) %>% 
    mutate(
    X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1) 
) 

# Process the data 
df2 <- df %>% 
    rowid_to_column() %>% 
    split(f = .$rowid) %>% 
    map_df(function(dt){ 
    dt_sub <- dt %>% 
     select(-rowid) %>% 
     select(sample(1:6, 3, replace = FALSE)) %>% 
     unite(X7, everything(), sep = ", ") 
    return(dt_sub) 
    }) %>% 
    bind_cols(df) %>% 
    select(paste0("X", 1:7)) 

df2 
    X1 X2 X3 X4 X5 X6    X7 
1 -0.6 0.6 0.5 0.1 0.9 0.1 0.1, 0.5, 0.9 
2 -0.2 0.1 0.3 0.0 -1.0 0.2 0.1, 0.3, 0.2 
3 1.6 0.2 0.1 2.1 2.0 1.6 1.6, 2.1, 0.1 
4 0.1 0.4 -0.6 -0.7 -0.1 -0.2 0.1, 0.4, -0.6 
5 0.1 -0.5 -0.8 -1.1 0.2 0.2 0.1, 0.2, -0.5 
6 1.7 -0.3 -1.0 0.0 -0.7 1.2 -1, -0.7, -0.3 
7 0.5 -1.0 0.1 0.3 -0.6 1.1 0.5, -0.6, -1 
... 
1

我认为最好的办法是使用基础R功能replicatesamplesapply

inx <- t(replicate(nrow(df), sample(1:6, 3, replace = F))) 
df$X7 <- sapply(seq_len(nrow(df)), function(i) 
      paste(df[i, inx[i, ]], collapse = ", ")) 
+0

@ycw完成。错误更正。 –

1

这是dplyr溶液:

library(dplyr) 

df %>% 
    group_by(idx = seq(n())) %>% 
    do({ 
    res <- select(., -idx) 
    bind_cols(res, X7 = toString(sample(unlist(res), 
             3, replace = FALSE))) 
    }) %>% 
    ungroup() %>% 
    select(-idx) 

其结果是:

# A tibble: 57 x 7 
     X1 X2 X3 X4 X5 X6    X7 
    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <chr> 
1 0.4 0.4 -0.1 3.4 0.9 -0.4 0.4, 0.9, 0.4 
2 1.5 0.9 -0.7 1.5 -1.1 -0.3 -0.7, 1.5, -1.1 
3 -0.1 -0.5 -0.6 -0.8 -0.3 2.3 -0.3, 2.3, -0.8 
4 0.7 -1.0 0.3 0.2 -0.5 -0.3 -1, 0.3, -0.3 
5 0.6 0.9 0.4 1.9 -0.7 -2.0 0.4, -2, 0.9 
6 0.3 0.7 1.3 0.6 1.3 -0.2 0.7, -0.2, 1.3 
7 0.5 0.3 1.1 -0.2 -0.4 -0.8 0.5, 1.1, 0.3 
8 0.4 -1.9 0.8 -0.6 -1.1 0.4 0.4, -1.9, -0.6 
9 0.2 -1.5 -1.9 1.0 0.0 0.6  0, 1, 0.6 
10 -0.2 0.7 -0.5 1.4 0.3 -0.1 -0.2, 0.3, -0.5 
+1

@ycw好主意,谢谢指出!我相应地修改了我的答案。 –