难随机化，基于频率排名

我有这样一个数据帧：难随机化，基于频率排名

X = data.frame（A = C（ “D1”， “D1”， “D1”， “D1”， “D1” D2“，”D3“，”D3“，”D4“，”D4“，”D4“，”D5“，”D5“），B = c（”A1“，”A3“，”A4“ “A5”， “A6”， “A5”， “A5”， “A6”， “A6”， “A1”， “A2”， “A5”， “A6”））

A  B 
D1 A1 
D1 A3 
D1 A4 
D1 A5 
D1 A6 
D2 A5 
D3 A5 
D3 A6 
D4 A6 
D4 A1 
D4 A2 
D5 A5 
D5 A6

排序通过列B，列B中的实体具有不同的频率。

A B freq(B) 
D1 A1 2 
D4 A1 2 
D4 A2 1 
D1 A3 1 
D1 A4 1 
D1 A5 4 
D2 A5 4 
D3 A5 4 
D5 A5 4 
D1 A6 4 
D3 A6 4 
D4 A6 4 
D5 A6 4

我要生成上数据帧x的B列的随机数据帧，但随机化只能采取地方条目的频率是相同的或相似的（+/-一个等级）。 Let'said。现在，A2，A3，A4的频率为1，因此A2，A3和A4可以自由地互换，但不能与A5和A6以及A1互换。同样，由于A5和A6的频率为4，它们可以在它们之间随机化。对于频率= 2（根据频率（B））排列的唯一条目A1），由于没有替换可以发生，所以对A1给予特殊条件。 A1可以随机地由A2，A3，A4（其排名为一个等级（1，排名第一，基于freq（B））低于A1）或A5/A6（排名第一等级（4，排名第二，排名第三）频率（B））高于A1）。

是否有可能被R轻松完成？

来源

2011-06-13 a83

你说的随机是什么意思？你想从“B”中的每个值中抽样并返回一行吗？返回所有这些，但以随机方式订购它们？请提供一个示例输出。 – Chase 2011-06-13 11:16:59

@ a83我会回应@蔡斯的评论 - 请尝试解释你想要做什么替换。我已经发布了一个答案，我认为你只需要一个单一的特定组，但请看一看，如果这不符合你的要求，请回复我们。 – 2011-06-13 13:24:40

您对随机问题的下半部分是有点不清楚，但这里是一个开始。当你更新你的问题 - 我会相应地更新答案。下面的代码添加B列的计数信息，然后根据我们添加的频率列的值对行进行采样。我认为从这里所需要的只是修改哪些色谱柱可用于取样，但请确认你想要的。

require(plyr) 
x <- merge(x,count(x, "B")) 
ddply(x, "freq", function(x) sample(x))

来源

2011-06-13 11:25:48 Chase

这就是问题的可用性修改。尽管简洁的代码+1。 – 2011-06-14 15:08:13

，第一部分是很容易的功能在我permute包（仅限于R-forge的时刻）来处理

require(permute) ## install from R-forge if not available 
x <- data.frame(A = c("D1","D1","D1","D1","D1","D2","D3","D3", 
         "D4","D4","D4","D5","D5"), 
       B = c("A1","A3","A4","A5","A6","A5","A5","A6", 
         "A6","A1","A2","A5","A6")) 
x <- x[order(x$B), ] 
x <- transform(x, freq = rep((lens <- sapply(with(x, split(B, B)), 
          length)), lens)) 
set.seed(529) 
ind <- permuted.index(NROW(x), control = permControl(strata = factor(x$freq)))

其中给出：

R> x[ind, ] 
    A B freq 
10 D4 A1 2 
1 D1 A1 2 
11 D4 A2 1 
2 D1 A3 1 
3 D1 A4 1 
12 D5 A5 4 
4 D1 A5 4 
9 D4 A6 4 
13 D5 A6 4 
5 D1 A6 4 
6 D2 A5 4 
8 D3 A6 4 
7 D3 A5 4 
R> ind 
[1] 2 1 3 4 5 9 6 12 13 10 7 11 8

我们可以换，这是一声明生成ň排列

ctrl <- permControl(strata = factor(x$freq)) 
n <- 10 
set.seed(83) 
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))

其中给出：

> IND 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 2 2 1 2 1 2 1 2 1  1 
[2,] 1 1 2 1 2 1 2 1 2  2 
[3,] 3 5 4 3 5 5 4 5 5  5 
[4,] 5 3 5 5 3 4 5 4 4  4 
[5,] 4 4 3 4 4 3 3 3 3  3 
[6,] 9 12 11 12 6 10 13 10 8 13 
[7,] 10 11 6 11 13 7 7 12 7  9 
[8,] 8 9 9 10 8 6 11 13 12 10 
[9,] 12 10 8 6 9 13 9 6 9 11 
[10,] 13 6 12 9 7 9 8 8 13  8 
[11,] 6 7 10 13 12 11 6 11 10  7 
[12,] 11 8 13 7 11 8 10 7 6 12 
[13,] 7 13 7 8 10 12 12 9 11  6

现在你也需要做一些专项抽检。如果我理解正确，你想要确定哪一个频率级别只包含一个单独的级别B.然后可能随机地将B级别的频率级别替换为从相邻频率级别的B级级别中随机选择的B.如果是这样的话，就更加复杂，得到正确的行来替换了一点，但我认为下面的功能做的：

randSampleSpecial <- function(x, replace = TRUE) { 
    ## have we got access to permute? 
    stopifnot(require(permute)) 
    ## generate a random permutation within the levels of freq 
    ind <- permuted.index(NROW(x), 
          control = permControl(strata = factor(x$freq))) 
    ## split freq into freq classes 
    ranks <- with(x, split(freq, freq)) 
    ## rank the freq classes 
    Ranked <- rank(as.numeric(names(ranks))) 
    ## split the Bs on basis of freq classes 
    Bs <- with(x, split(B, freq)) 
    ## number of unique Bs in freq class 
    uniq <- sapply(Bs, function(x) length(unique(x))) 
    ## which contain only a single type of B? 
    repl <- which(uniq == 1) 
    ## if there are no freq classes with only one level of B, return 
    if(!(length(repl) > 0)) 
     return(ind) 
    ## if not, continue 
    ## which of the freq classes are adjacent to unique class? 
    other <- which(Ranked %in% (repl + c(1,-1))) 
    ## generate uniform random numbers to decide if we replace 
    Rand <- runif(length(ranks[[repl]])) 
    ## Which are the rows in `x` that we want to change? 
    candidates <- with(x, which(freq == as.numeric(names(uniq[repl])))) 
    ## which are the adjacent values we can replace with 
    replacements <- with(x, which(freq %in% as.numeric(names(uniq[other])))) 
    ## which candidates to replace? Decision is random 
    change <- sample(candidates, sum(Rand > 0.5)) 
    ## if we are changing a candidate, sample from the replacements and 
    ## assign 
    if(length(change) > 0) 
     ind[candidates][change] <- sample(ind[replacements], length(change), 
              replace = replace) 
    ## return 
    ind 
}

要使用此，我们：

R> set.seed(35) 
R> randSampleSpecial(x) 
[1] 2 1 5 3 4 6 9 12 10 11 7 8 13

我们可以在replicate()调用把这个包产生许多这样的替代品：

R> IND <- replicate(10, randSampleSpecial(x)) 
R> IND 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 11 3 6 4 2 1 1 2 10  3 
[2,] 1 11 1 12 11 11 2 1 1 13 
[3,] 4 5 4 3 4 3 4 5 5  4 
[4,] 5 4 5 5 5 4 5 3 3  3 
[5,] 3 3 3 4 3 5 3 4 4  5 
[6,] 11 7 11 12 9 6 7 8 9  9 
[7,] 13 12 12 7 11 7 9 10 8 10 
[8,] 10 8 9 8 12 12 8 6 13  8 
[9,] 7 9 13 10 8 10 13 9 12 11 
[10,] 6 11 10 11 10 13 12 13 10 13 
[11,] 12 10 6 6 6 9 11 12 7 12 
[12,] 9 6 7 9 7 8 10 7 6  7 
[13,] 8 13 8 13 13 11 6 11 11  6

对于这个数据集，我们知道这是行1和2中的排序x，我们可能要替换来自其他freq类的值。如果我们没有完成替换，则前两行的值将只有1或2（请参见前面的）。在新的中，前两行中的值为而非 a 1或2，我们用其中一个相邻频率类中的B代替它。

我的函数假设你想：

只随意替换相邻类之一，在同质频率类元素！如果你想总是替换，那么我们改变功能来适应。

如果我们正在做替换，那么替换可以是任何替换，并且如果我们需要多于1个替换，则可以不止一次地选择相同的替换。在呼叫中设置replace = FALSE以进行无需替换的采样，如果这是您想要的。

该函数假定您只有一个单个单特性频率类别。如果应该很容易使用循环遍历两个或多个单特定类来修改，但这确实会使函数复杂化，并且由于您对问题的描述不太清楚，我将事情简单化了。

来源

2011-06-13 13:19:21

排序为+1，还不知道。 – 2011-06-14 15:13:21

@Gavin给你一个很好的方法，并询问是否有人可以想出更简单的方法。下一个功能也是一样的，仅基于基本功能。它使用count来处理频率，并且考虑到对于最小en最大频率，只有一个相邻秩。加文的功能在这种情况下给出了一个错误。

Permdf <- function(x,v){ 
    # some code to allow Permdf(df,var) 
    mc <- match.call() 
    v <- as.quoted(mc$v) 
    y <- unlist(eval.quoted(v,x)) 
    # make bins with values in v per frequency 
    freqs <- count(x,v) 
    bins <- split(freqs[[1]],freqs[[2]]) 
    nbins <- length(bins) 
    # define the output 
    dfid <- 1:nrow(x) 

    for (i in 1:nbins){ 
    # which id's to change 
    id <- which(y %in% bins[[i]]) 

    if(length(bins[[i]]) > 1){ 
     # in case there's more than one value for that frequency 
     dfid[id] <- sample(dfid[id]) 
    } else { 
     bid <- c(i-1,i,i+1) 
     # control wether id in range 
     bid <- bid[bid > 0 & bid <=nbins] 
     # id values to choose from 
     vid <- which(y %in% unlist(bins[bid])) 
     # random selection 
     dfid[id] <- sample(vid,length(id),replace=TRUE) 
    } 
    } 
    #return 
    dfid 
}

这可以作为

Permdf(x,B)

来源

2011-06-14 15:06:04

难随机化，基于频率排名

回答

相关问题