首先,一些样本数据:
## Sample data
nMen <- 50
nWomen <- 60
set.seed(124)
mydata <- data.frame(SEX = rep(c("female", "male"), times = c(nWomen, nMen)),
myValue = rnorm(nMen + nWomen), ID = seq_len(nMen + nWomen))
然后,计算出你想每个样本中男性和女性的数量 - 这必须是整数
## Number of women and men for the sampling
nSampW <- (nWomen + nMen) * 0.9
nSampM <- (nWomen + nMen) * 0.1
## These should be integer (the following should be TRUE)
nSampW %% 1 ==0
nSampM %% 1 ==0
然后设置你的结果向量 - 下面创建了空间名单200个样本
## Set up results list
mySamp <- vector(mode = "list", length = 200)
然后循环,取样按性别划分,从指标计算以上男性和女性人数
## The loop
for(i in seq_along(mySamp)) {
## Get indices by SEX
idxW <- which(mydata$SEX == "female")
idxM <- which(mydata$SEX == "male")
## Sample corresponding number of rows from those indexes with replacement
tempW <- mydata[sample(idxW, nSampW, replace = TRUE), ]
tempM <- mydata[sample(idxM, nSampM, replace = TRUE), ]
## rbind back together and assign
mySamp[[i]] <- rbind(tempW, tempM)
}
然后检查,看看比例是否正确
# sapply(mySamp[1:10], function(x) prop.table(table(x$SEX)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# female 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
# male 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
什么'cox'为什么不'nrow(数据)'?什么是'smpl'?它是一个正确分配的列表吗?你为什么不使用'smpl [[i]]'?不要说只是“它不工作”,而是指定你遇到的问题(错误?意外结果?警告?) – nicola
嗨!对不起,在原始帖子中添加了缺少的信息。该代码绘制随机样本,但不在指定比例内。当我尝试循环200次以创建200个数据帧时,它不会执行它...(我原始数据集的名称是“cox” - 复制粘贴错误) – user3018739
您应该在循环:'smpl <-vector(“list”,200)'和循环内部使用'smpl [[i]] < - '与双方括号。你的意思是“不保持比例”?由于采样方差,获得的样本不完全是180-20是正常的。 – nicola