2017-06-01 45 views
0

我是R的新用户,并且正在尝试创建数据框的多个子采样。我将我的数据分配到4层(STRATUM = 1,2,3,4),并且希望在每个层中只保留指定数量的行。为了实现这一点,我导入我的数据,按分层值排序,然后为每行分配一个随机数。我想保留我原来的随机数字分配,因为我需要在未来的分析中再次使用它们,所以我用这些值保存了一个.csv。接下来,我按他们的层次对数据进行分组,然后指定我想要在每个层中保留的记录数。最后,我重新加入数据并保存为新的.csv。代码有效,但是,我想重复这个过程100次。在每种情况下,我想要保存带有随机数字的.csv,以及最终的.csv随机选择的图。我不确定如何让这段代码重复100次,以及如何为每次迭代分配一个唯一的文件名。任何帮助将非常感激。R - 如何使用新的随机数字重复数据帧处理100x并绘制删除

DataFiles <- "//Documents/flownData_JR.csv" 
PlotsFlown <- read.table (file = DataFiles, header = TRUE, sep = ",") 
#Sort the data by the stratification 
FlownStratSort <- PlotsFlown[order(PlotsFlown$STRATUM),] 
#Create a new column with a random number (no duplicates) 
FlownStratSort$RAND_NUM <- sample(137, size = nrow(FlownStratSort), replace = FALSE) 
#Sort by the stratum, then random number 
FLOWNRAND <- FlownStratSort[order(FlownStratSort$STRATUM,FlownStratSort$RAND_NUM),] 
#Save a csv file with the random numbers 
write.table(FLOWNRAND, file = "//Documents/RANDNUM1_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE) 
#Subset the data by stratum 
FLOWNRAND1 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='1'),] 
FLOWNRAND2 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='2'),] 
FLOWNRAND3 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='3'),] 
FLOWNRAND4 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='4'),] 
#Remove data from each stratum, specifying the number of records we want to retain 
FLOWNRAND1 <- FLOWNRAND1[1:34, ] 
FLOWNRAND2 <- FLOWNRAND2[1:21, ] 
FLOWNRAND3 <- FLOWNRAND3[1:7, ] 
FLOWNRAND4 <- FLOWNRAND4[1:7, ] 
#Rejoin the data 
FLOWNRAND_uneven <- rbind(FLOWNRAND1, FLOWNRAND2, FLOWNRAND3, FLOWNRAND4) 
#Save the table with plots removed from each stratum flown in 2017 
write.table(FLOWNRAND_uneven, file = "//Documents/Flown_RAND_uneven_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE) 

回答

0

这里有一个data.table解决方案,如果你只需要知道哪些行是在每一组。

library(data.table) 
df <- data.table(dat = runif(100), 
       stratum = sample(1:4, 100, replace = T)) 

# Gets specified number randomly from each strata 
get_strata <- function(df, n, i){ 
    # Subset data frame to randomly chosen w/in strata 
    # replace stratum with var name 
    f <- df[df[, .I[sample(.N, n)], by = stratum]$V1] 

    # Save as CSV, replace path 
    write.csv(f, file = paste0("path/df_", i), 
      row.names = F, col.names = T) 
} 

for (i in 1:100){ 
    # replace 10 with number needed 
    get_strata(df, 10, i) 
} 
+0

作为最终结果我想具有所有我的原始数据列的一个.csv,但是从层1只34行,从层数为2 21点的行,从第3层7行,和从地层4 7行。我希望每个阶层中的这些行都是随机选择的,这样每次重复我都会在阶层中获得不同的行子集。我想重复这个过程100倍,生成100个.csv文件。 –