在两个不同的向量中按照它们的顺序匹配数字

标题并没有真正做到这个问题正义，但我想不出任何其他方式来描述问题。我可以用一个例子来最好地解释这个问题。在两个不同的向量中按照它们的顺序匹配数字

比方说，我们有一个数字的两个向量（其中的每一个总是会被上升的和独特的）：

vector1 <- c(1,3,10,11,24,26,30,31) 
vector2 <- c(5,9,15,19,21,23,28,35)

我所试图做的是创造条件，采取这两个向量函数，以下面的方式与它们匹配：

1）开始与向量1的第一个元素（在此情况下，1）

2）转到vector2和在载体2从＃1的元件与所述第一元件相匹配比它大（在这种情况下，5）

3）返回向量1，并跳过小于在＃2中，我们找到的值（在这种情况下，我们跳过3的所有元素，并抓住10）

4）回到vector2并跳过所有元素小于我们找到的＃3中的值（在这种情况下，我们跳过9并抢15）

5）重复，直到我们完成所有元素。

我们应该得到的两个向量：

result1 = c(1,10,24,30) 
result2 = c(5,15,28,35)

我目前的解决方案是这样的，但我认为这可能是非常低效的：

# establishes where we start from the vector2 numbers 
# just in case we have vector1 <- c(5,8,10) 
# and vector2 <- c(1,2,3,4,6,7). We would want to skip the 1,2,3,4 values 

    i <- 1 
    while(vector2[i]<vector1[1]){ 
    i <- i+1 
    } 

# starts the result1 vector with the first value from the vector1 

    result1 <- vector1[1] 

# starts the result2 vector empty and will add as we loop through 

    result2 <- c() 


# super complicated and probably hugely inefficient loop within a loop within a loop 
# i really want to avoid doing this, but I cannot think of any other way to accomplish this 

    for(j in 1:length(vector1)){ 

    while(vector1[j] > vector2[i] && (i+1) <= length(vector2)){ 

     result1 <- c(result1,vector1[j]) 
     result2 <- c(result2,vector2[i])   

     while(vector1[j] > vector2[i+1] && (i+2) <= length(vector2)){ 

     i <- i+1 
     } 
     i <- i+1 
    } 
    } 

    ## have to add on the last vector2 value cause while loop skips it 
    ## if it doesn't exist (there are no more vector2 values bigger) we put in an NA 

    if(result1[length(result1)] < vector2[i]){ 
    result2 <- c(result2,vector2[i]) 
    } 
    else{ 
    ### we ran out of vector2 values that are bigger 
    result2 <- c(result2,NA) 
    }

来源

2014-10-31 road_to_quantdom

这实在是难以解释。只是把它神奇:)

vector1 <- c(1,3,10,11,24,26,30,31) 
vector2 <- c(5,9,15,19,21,23,28,35) 
## another case 
# vector2 <- c(0,9,15,19,21,23,28,35) 

## handling the case where vector2 min value(s) are < vector1 min value 
if (any(idx <- which(min(vector1) >= vector2))) 
    vector2 <- vector2[-idx] 

## interleave the two vectors 
tmp <- c(vector1,vector2)[order(c(order(vector1), order(vector2)))] 

## if we sort the vectors, which pairwise elements are from the same vector 
r <- rle(sort(tmp) %in% vector1)$lengths 

## I want to "remove" all the pairwise elements which are from the same vector 
## so I again interleave two vectors: 
## the first will be all TRUEs because I want the first instance of each *new* vector 
## the second will be all FALSEs identifying the elements I want to throw out because 
## there is a sequence of elements from the same vector 
l <- rep(1, length(r)) 
ord <- c(l, r - 1)[order(c(order(r), order(l)))] 

## create some dummy TRUE/FALSE to identify the ones I want 
res <- sort(tmp)[unlist(Map(rep, c(TRUE, FALSE), ord))] 

setNames(split(res, res %in% vector2), c('result1', 'result2')) 

# $result1 
# [1] 1 10 24 30 
# 
# $result2 
# [1] 5 15 28 35

显然这如果这两个向量上升只会工作和独特的，你说

编辑：

作品与重复：

vector1 <- c(1,3,10,11,24,26,30,31) 
vector2 <- c(5,9,15,19,21,23,28,35) 
vector2 <- c(0,9,15,19,21,23,28,35) 
vector2 <- c(1,3,3,5,7,9,28,35) 

f <- function(v1, v2) { 
    if (any(idx <- which(min(vector1) >= vector2))) 
    vector2 <- vector2[-idx] 

    vector1 <- paste0(vector1, '.0') 
    vector2 <- paste0(vector2, '.00') 

    n <- function(x) as.numeric(x) 

    tmp <- c(vector1, vector2)[order(n(c(vector1, vector2)))] 

    m <- tmp[1] 
    idx <- c(TRUE, sapply(1:(length(tmp) - 1), function(x) { 
    if (n(tmp[x + 1]) > n(m)) { 
     if (gsub('^.*\\.','', tmp[x + 1]) == gsub('^.*\\.','', m)) 
     FALSE 
     else { 
     m <<- tmp[x + 1] 
     TRUE 
     } 
    } else FALSE 
    })) 

    setNames(split(n(tmp[idx]), grepl('\\.00$', tmp[idx])), c('result1','result2')) 
} 
f(vector1, vector2) 

# $result1 
# [1] 1 10 30 
# 
# $result2 
# [1] 3 28 35

来源

2014-10-31 03:16:14 rawr

即这样一个聪明的解决方案。我从来没有想过交错两个向量（我想我没有充分利用我的上升和唯一性假设）。我只是改变了最后一行代码，并得到它的工作。非常感谢！ – 2014-10-31 22:10:21

对不起，麻烦了，但是当我试图将它与我以前的解决方案进行比较时，我发现了一个错误。如果vector1的值从vector2的值开始，则输出不正确。例如，vector1 =（3,10，...）和vector2是（1,5,9 ...）我通过简单地调整vector2来删除低于vector1的初始值的任何值，可以解决此问题。 – 2014-10-31 22:41:44

can你只是使用vector2作为矢量1，反之亦然？也就是说，总是先使用具有最小第一个元素的向量？ – rawr 2014-10-31 23:53:28

在两个不同的向量中按照它们的顺序匹配数字

回答

相关问题