查找大于0的最小值

我有一个数据框，其中包含一些NA的数值1：4。对于每一行，我想计算值最少出现次数大于0的值的频率（以百分比表示）。查找大于0的最小值

下面是一个示例数据框。

df = as.data.frame(rbind(c(1,2,1,2,2,2,2,1,NA,2),c(2,3,3,2,3,3,NA,2,NA,NA),c(4,1,NA,NA,NA,1,1,1,4,4),c(3,3,3,4,4,4,NA,4,3,4))) 

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 
    1 1 2 1 2 2 2 2 1 NA 2 
    2 2 3 3 2 3 3 NA 2 NA NA 
    3 4 1 NA NA NA 1 1 1 4 4 
    4 3 3 3 4 4 4 NA 4 3 4

我有2点，我挣扎着。 1）找到大于0的值的最低频率，2）将该函数应用于我的数据帧的每一行。当我开始研究这个函数时，我使用下面的代码实现了它，但它似乎并不适用于每一行。我对value.1，value.2等的结果对于每一行都是一样的。

Low_Freq = function(x){ 
     value.1 = sum(x==1, na.rm=TRUE) #count the number of 1's per row 
     value.2 = sum(x==2, na.rm=TRUE) #count the number of 2's per row 
     value.3 = sum(x==3, na.rm=TRUE) #count the number of 3's per row 
     value.4 = sum(x==4, na.rm=TRUE) #count the number of 4's per row 
     num.values = rowSums(!is.na(x), na.rm=TRUE) #count total number of non-NA values in each row 

     #what is the minimum frequency value greater than 0 among value.1, value.2, value.3, and value.4 for EACH row? 
     min.value.freq = min(cbind(value.1,value.2,value.3,value.4)) 

     out = min.value.freq/num.values #calculate the percentage of the minimum value for each row 
    } 

    df$Low_Freq = apply(df, 1, function(x))

然后我开始使用rowSums（）来计算value.1，value.2，value.3和value.4。这个固定我的每一行计数value.1，value.2等问题，但是，我只好再应用功能，而无需使用应用（的）才能运行：

Low_Freq = function(x){ 
     value.1 = rowSums(x==1, na.rm=TRUE) #count the number of 1's per row 
     value.2 = rowSums(x==2, na.rm=TRUE) #count the number of 2's per row 
     value.3 = rowSums(x==3, na.rm=TRUE) #count the number of 3's per row 
     value.4 = rowSums(x==4, na.rm=TRUE) #count the number of 4's per row 
     num.values = rowSums(!is.na(x), na.rm=TRUE) #count total number of non-NA values in each row 

     #what is the minimum frequency value greater than 0 among value.1, value.2, value.3, and value.4 for EACH row? 
     min.value.freq = min(cbind(value.1,value.2,value.3,value.4)) 

     out = min.value.freq/num.values #calculate the percentage of the minimum value for each row 
    } 

    df$Low_Freq = Low_Freq(df)

所以行为应用于每一行似乎都发生在函数内部。这一切都很好，但是当我将我的最终计算结果作为我的输出时，我无法弄清楚如何确定哪一个值1,2,3，或4对于每一行具有最低的频率。该值必须除以每行的非NA值的数量。

我想要的结果应该是这样的：

 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Low_Freq 
    1 1 2 1 2 2 2 2 1 NA 2 0.3333333 
    2 2 3 3 2 3 3 NA 2 NA NA 0.4285714 
    3 4 1 NA NA NA 1 1 1 4 4 0.4285714 
    4 3 3 3 4 4 4 NA 4 3 4 0.4444444

我觉得我在圈子里这个看似简单的功能去。任何帮助，将不胜感激。

谢谢。

来源

2014-01-21 SC2

table函数将返回出现的每个值的频率，忽略NA值。因此，table结果的min是行中出现的值的最小频率，并且该总和是行中的非值的数目NA。

Low_Freq = function(x){ 
    tab = table(x) 
    return(min(tab)/sum(tab)) 
} 
df$Low_Freq = apply(df, 1, Low_Freq) 
df 
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Low_Freq 
# 1 1 2 1 2 2 2 2 1 NA 2 0.3333333 
# 2 2 3 3 2 3 3 NA 2 NA NA 0.4285714 
# 3 4 1 NA NA NA 1 1 1 4 4 0.4285714 
# 4 3 3 3 4 4 4 NA 4 3 4 0.4444444

如果你想不使用5秒的分子，但使用它们的分母，你可以这样做：

df = as.data.frame(rbind(c(1,2,1,2,2,2,2,1,NA,2),c(2,3,3,2,3,3,NA,2,NA,NA),c(4,1,NA,NA,NA,1,1,1,4,4),c(3,3,3,4,4,4,5,4,3,4))) 
Low_Freq = function(x){ 
    tab = table(x[x != 5]) 
    return(min(tab)/sum(!is.na(x))) 
} 
df$Low_Freq = apply(df, 1, Low_Freq) 
df 
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Low_Freq 
# 1 1 2 1 2 2 2 2 1 NA 2 0.3333333 
# 2 2 3 3 2 3 3 NA 2 NA NA 0.4285714 
# 3 4 1 NA NA NA 1 1 1 4 4 0.4285714 
# 4 3 3 3 4 4 4 5 4 3 4 0.4000000

来源

2014-01-21 14:41:02 josliber

谢谢你的回复。如果我在两行中发生了一次数字“5”，而我只想使用数字1,2,3和4的最小频率，我该如何修改？但是这个数字除以的总数仍然应该是非NA值的数量（包括5）？ – SC2

@ SC2我更新了这个新功能 – josliber

美丽，非常感谢！ – SC2

查找大于0的最小值

回答

相关问题