2016-07-22 41 views
2

我有一个包含许多缺失值的数据集,如-999。部分数据是以不规则的间隔计算平均值而不考虑shell脚本中的缺失值?

input.txt 
30 
-999 
10 
40 
23 
44 
-999 
-999 
31 
-999 
54 
-999 
-999 
-999 
-999 
-999 
-999 
10 
23 
2 
5 
3 
8 
8 
7 
9 
6 
10 
and so on 

我想计算每个5,6,6行间隔的平均值,而不考虑缺失值。

欲望输出

ofile.txt 
25.75 (i.e. consider first 5 rows and take average without considering missing values, so (30+10+40+23)/4) 
43  (i.e. consider next 6 rows and take average without considering missing values, so (44+31+54)/3) 
-999 (i.e. consider next 6 and take average without considering missing values. Since all are missing, so write as a missing value -999) 
8.6  (i.e. consider next 5 rows and take average (10+23+2+5+3)/5) 
8  (i.e. consider next 6 rows and take average) 

,如果它是有规律的间隔时间我可以做(让说5)本

awk '!/\-999/{sum += $1; count++} NR%5==0{print count ? (sum/count) :-999;sum=count=0}' input.txt 

我问在这里定期间隔Calculating average without considering missing values in shell script?类似的问题,但我在这里要求解决方案的不规则间隔。

+0

虽然结构良好的Q,你的数学例子让我害怕;-):'考虑下6行并取平均值(44 + 31 + 54)/ 3)'。你不需要6个值并除以6.只有你的8.6例子看起来是正确的。祝你好运 – shellter

+0

@shellter谢谢。它不应该考虑缺失的值,也不应将其视为贡献者。 – Kay

+0

啊,明白了。对不起,我没有仔细阅读你的问答。祝你好运。 – shellter

回答

2

随着AWK

awk -v f="5" 'f&&f--&&$0!=-999{c++;v+=$0} NR%17==0{f=5;r++} 
!f&&NR%17!=0{f=6;r++} r&&!c{print -999;r=0} r&&c{print v/c;r=v=c=0} 
END{if(c!=0)print v/c}' input.txt 

输出

25.75 
43 
-999 
8.6 
8 

击穿

f&&f--&&$0!=-999{c++;v+=$0} #add valid values and increment count 
NR%17==0{f=5;r++} #reset to 5,6,6 pattern 
!f&&NR%17!=0{f=6;r++} #set 6 if pattern doesnt match 
r&&!c{print -999;r=0} #print -999 if no valid values 
r&&c{print v/c;r=v=c=0} #print avg 
END{ 
if(c!=0) #print remaining values avg 
    print v/c 
} 
2
$ cat tst.awk 
function nextInterval( intervals) { 
    numIntervals = split("5 6 6",intervals) 
    intervalsIdx = (intervalsIdx % numIntervals) + 1 
    return intervals[intervalsIdx] 
} 

BEGIN { 
    interval = nextInterval() 
    noVal = -999 
} 

$0 != noVal { 
    sum += $0 
    cnt++ 
} 

++numRows == interval { 
    print (cnt ? sum/cnt : noVal) 
    interval = nextInterval() 
    numRows = sum = cnt = 0 
} 

$ awk -f tst.awk file 
25.75 
43 
-999 
8.6 
8