从数据框中删除行直到满足条件

我有一个函数remove_fun，它根据某些条件从数据框中删除行（该函数太冗长以至于不能包含，所以这里是一个简化示例:)。从数据框中删除行直到满足条件

比方说，我有一个名为block_2数据帧，有两列：

Treatment seq 
     1 29 
     1 23 
     3 60 
     1 6 
     2 41 
     1 5 
     2 44

对于这个例子的目的，让我们说我的函数的基础上的最高值时去除block_2 1行seq在block_2$seq。此功能效果很好，当我跑了一次，即remove_fun(block_2)将返回以下输出：

Treatment seq 
    1  29 
    1  23 
    1  6 
    2  41 
    1  5 
    2  44

然而，什么我没搞清楚是如何实现重复我remove_fun，直到我减少block_2到一定尺寸。

我的想法是做这样的事情：

while (dim(block_2_df)[1]>1)#The number of rows of block_2_df{ 
    remove_fun(block_2_df) 
}

这在理论上减少block_2_df直到仅对应于最低序列号的观测仍然存在。

但是，这不起作用。我认为我的问题与我有关，不知道如何反复使用我的'更新'block_2_df。我想做到的是一些代码，做这样的事情：

new_df_1<-remove_fun(block_2) 
new_df_2<-remove_fun(new_df_1) 
new_df_3<-remove_fun(new_df_2)

等等

我不一定要找一个确切的解决这个问题（因为我没有提供remove_fun），但我会很感激一些见解：解决问题的一般方法。

编辑：这是我的一些示例数据实际代码：

#Start from a block of 10*6 balls, with lambda*(wj) balls of each class 
#Allocation ratios 
class_1<-"a" 
class_2<-"b" 
class_3<-"c" 

ratio_a<-3 
ratio_b<-2 
ratio_c<-1 
#Min_set 
min_set<-c(rep(class_1,ratio_a),rep(class_2,ratio_b),rep(class_3,ratio_c)) 
min_set_num<-ifelse(min_set=='a',1,ifelse(min_set=='b',2,3)) 

table_key <- table(min_set_num) 

#Number of min_sets 
lamb<-10 
#Active urn 
block_1<-matrix(0,lamb,length(min_set)) 
for (i in 1:lamb){ 
    block_1[i,]<-min_set 
} 

#Turn classes into a vector 
block_1<-as.vector(block_1) 
block_1<-ifelse(block_1=='a',1,ifelse(block_1=='b',2,3)) 
#Turn into a df w/ identifying numbers: 
block_1_df<-data.frame(block_1,seq(1:length(block_1))) 
#Enumerate all sampling outcome permutations 
library('dplyr') 
#Create inactive urn 
#Sample from block_1 until min_set is achieved, store in block_2##### 
#Random sample : 
block_2<-sample(block_1,length(block_1),replace=F) 

block_2_df<-block_1_df[sample(nrow(block_1_df), length(block_1)), ] 
colnames(block_2_df)<-c('Treatment','seq') 
#Generally:#### 

remove_fun<-function(dat){ 
    #For df 
    min_set_obs_mat<-matrix(0,length(block_1),2) 
    min_set_obs_df<-as.data.frame(min_set_obs_mat) 
    colnames(min_set_obs_df)<-c('Treatment','seq') 

    for (i in 1:length(block_1)){ 
    if ((sum(min_set_obs_df[,1]==1)<3) || (sum(min_set_obs_df[,1]==2)<2) || (sum(min_set_obs_df[,1]==3)<1)){ 
     min_set_obs_df[i,]<-dat[i,] 
    } 
    } 
    #Get rid of empty rows in df: 
    min_set_obs_df<-min_set_obs_df%>%filter(Treatment>0) 

    #Return the sampled 'balls' which satisfy the minimum set into block_2_df (randomized block_!), #### 
    #keeping the 'extra' balls in a new df: extra_df:#### 

    #Question: does the order of returning matter?#### 

    #Identify min_set 
    outcome_df<-min_set_obs_df %>% group_by(Treatment) %>% do({ 
    head(., coalesce(table_key[as.character(.$Treatment[1])], 0L)) 
    }) 

    #This removes extra observations 'chronologically' 
    #Identify extra balls 
    #Extra_df is the 'inactive' urn#### 
    extra_df<-min_set_obs_df%>%filter(!(min_set_obs_df$seq%in%outcome_df$seq)) 
    #Question: is the number of pts equal to the block size? (lambda*W)?###### 

    #Return min_df back to block_2_df, remove extra_df from block_2_df: 
    dat<-dat%>%filter(!(seq%in%extra_df$seq)) 

return(dat) 
}

来源

2017-07-17 lecreprays

您while循环不会重新定义block2_df。这应该工作：

while (dim(block_2_df)[1]>1) { 
    block_2_df <- remove_fun(block_2_df) 
}

来源

2017-07-17 23:06:13 Eldioo

谢谢你的提示。但是，由于某些原因，这仍然不起作用。我在原始文章中附加了我的功能的实际代码，以查看您/任何人是否可以发现问题。 – lecreprays

期望的输出是什么？您是否希望最终获得具有特定数量的每个治疗组的数据框？ while循环对于重复应用应该很好，所以问题在于函数。 – Eldioo

如果你需要的是子集的数据帧的方式...

df <- data.frame(Treatment = c(1, 1, 3, 1, 2, 1, 2), 
        seq = c(29, 23, 60, 6, 41, 5, 44)) 

df 
    Treatment seq 
1   1 29 
2   1 23 
3   3 60 
4   1 6 
5   2 41 
6   1 5 
7   2 44 

# Decide how many rows you want in output 

n <- 6 

# Find the top "n" values in the seq variable 

head(sort(df$seq), n) 
[1] 5 6 23 29 41 44 


# Use them in the subset criteria 

df[df$seq %in% head(sort(df$seq), n), ] 
    Treatment seq 
1   1 29 
2   1 23 
4   1 6 
5   2 41 
6   1 5 
7   2 44

来源

2017-07-18 00:20:30 Damian

正如我在OP中提到的那样，我实际上试图完成一个df（任意）的减少直到某个条件。 – lecreprays

在示例代码中迭代改变'n'将会减少基于'df $ seq'最高值的数据帧，直到剩下一行为止，seq的最低值，如示例中所述 – Damian

从数据框中删除行直到满足条件

回答

相关问题