通过索引列表容忍组数据

我不知道如何解释它很快。我尽我所能：我有以下示例数据：通过索引列表容忍组数据

Data<-data.frame(A=c(1,2,3,5,8,9,10),B=c(5.3,9.2,5,8,10,9.5,4),C=c(1:7))

和指数

Ind<-data.frame(I=c(5,6,2,4,1,3,7))

在Ind值对应于Data的C列。现在我想从第一个Ind值开始，并在Data data.frame（列C）中找到相应的行。从那一行开始，我想上下查找A列中容差范围为1的值。我想将这些值写入结果数据框中，添加一个组ID列，并在数据框中删除它Data（其中I找到他们）。然后我从索引数据框Ind中的下一个条目开始，直到data.frame Data为空。我知道如何我Ind与我Data以及如何C列写入和删除，并在其他的东西for循环相匹配，但我不知道主点，这是我的问题在这里：

当我发现我的排在Data，我如何查找容差范围内的列A的拟合值，并在该条目下方获得我的Group ID？

什么，我想是这样的结果：

A  B  C  Group 
1  5.3 1  2    
2  9.2 2  2     
3  5  3  2    
5  8  4  3   
8  10 5  1     
9  9.5 6  1     
10 4  7  4

也许有人能帮助我，在我的问题的临界点，甚至如何解决这个问题的快捷方式。

非常感谢！

来源

2017-05-08 JmO

一般：避免在循环内逐行删除或增长数据帧。 R的内存管理意味着每次添加或删除一行时，都会创建另一个数据框副本。垃圾收集最终会丢弃数据帧的“旧”副本，但垃圾可能会迅速累积并降低性能。相反，将逻辑列添加到Data数据帧，并将“提取”行设置为TRUE。所以像这样：

Data$extracted <- rep(FALSE,nrow(Data))

至于你的问题：我得到一组不同的分组号码，但组是相同的。

可能有一个更优雅的方式来做到这一点，但这将完成它。

# store results in a separate list 
res <- list() 

group.counter <- 1 

# loop until they're all done. 
for(idx in Ind$I) { 
    # skip this iteration if idx is NA. 
    if(is.na(idx)) { 
    next 
    } 

    # dat.rows is a logical vector which shows the rows where 
    # "A" meets the tolerance requirement. 
    # specify the tolerance here. 
    mytol <- 1 
    # the next only works for integer compare. 
    # also not covered: what if multiple values of C 
    # match idx? do we loop over each corresponding value of A, 
    # i.e. loop over each value of 'target'? 
    target <- Data$A[Data$C == idx] 

    # use the magic of vectorized logical compare. 
    dat.rows <- 
    ((Data$A - target) >= -mytol) & 
    ((Data$A - target) <= mytol) & 
    (! Data$extracted) 
    # if dat.rows is all false, then nothing met the criteria. 
    # skip the rest of the loop 
    if(! any(dat.rows)) { 
    next 
    } 

    # copy the rows to the result list. 
    res[[length(res) + 1]] <- data.frame(
    A=Data[dat.rows,"A"], 
    B=Data[dat.rows,"B"], 
    C=Data[dat.rows,"C"], 
    Group=group.counter # this value will be recycled to match length of A, B, C. 
) 

    # flag the extraction. 
    Data$extracted[dat.rows] <- TRUE 
    # increment the group counter 
    group.counter <- group.counter + 1 
} 

# now make a data.frame from the results. 
# this is the last step in how we avoid 
#"growing" a data.frame inside a loop. 
resData <- do.call(rbind, res)

来源

2017-05-08 03:15:55 Jason

非常感谢你！ – JmO

请注意，这不会给出“最佳”分组 - 有集群分析的软件包。但是，如果这符合你的需求，那就足够了。 – Jason

再次感谢你，你会在这里推荐哪个包？ – JmO

通过索引列表容忍组数据

回答

相关问题