传递循环使用非整数awk

我想写代码将实现：其中$ 7小于$我（0 - 1的增量为0.05），打印行并传递到字数。我试图做到这一点的方法是：传递循环使用非整数awk

for i in $(seq 0 0.05 1); do awk '{if ($7 <= $i) print $0}' file.txt | wc -l ; done

这只是最终返回完整的文件（〜40万条用户线）为$我每个实例的字数。例如，当使用$ 7 < = 0.00时，它应该返回〜67K。

我觉得可能有办法在awk中做到这一点，但我还没有看到任何允许非整数的建议。

在此先感谢。

来源

2017-08-30 Lynsey Hall

[谷歌“awk不是shell”]（https://www.google.com/search?q=%22awk+is+not+shell%22） –

您需要将$ i作为变量传递给awk，其中-v –

我现在已经这么做了，谢谢。在发布之前，我不知道这就是问题出现的原因，否则我的Google漏洞可能会更加丰硕！但我现在知道了，应该将这个词传播给遇到这个问题的其他人:) –

通行证$ i到的awk与-v等的变量：

for i in $(seq 0 0.05 1); do awk -v i=$i '{if ($7 <= i) print $0}' file.txt | wc -l ; done

来源

2017-08-30 14:57:49

谢谢你的帮助，我不知道-v。我的谷歌搜索引导了我awk'BEGIN {while（getline <“'”$ INPUTFILE“'”）{路径，这似乎越来越精细！我很高兴有一个吝啬的方式实现相同的结果！ –

根据文件中数据的结构，您也可以使用完整的awk解决方案，并且不需要使用bash for循环 –

感谢您的建议。我会检查这个未来。 –

一些由数据：

$ cat file.txt 
1 2 3 4 5 6 7 a b c d e f 
1 2 3 4 5 6 0.6 a b c 
1 2 3 4 5 6 0.57 a b c d e f g h i j 
1 2 3 4 5 6 1 a b c d e f g 
1 2 3 4 5 6 0.21 a b 
1 2 3 4 5 6 0.02 x y z 
1 2 3 4 5 6 0.00 x y z l j k

一种可能的100％awk溶液：

awk ' 
BEGIN { line_count=0 } 

{ printf "================= %s\n",$0 

    for (i=0; i<=20; i++) 
    { if ($7 <= i/20) 
     { printf "matching seq : %1.2f\n",i/20 
      line_count++ 
      seq_count[i]++ 
      next 
      } 
    } 
} 

END { printf "=================\n\n" 

     for (i=0; i<=20; i++) 
     { if (seq_count[i] > 0) 
      { printf "seq = %1.2f : %8s (count)\n",i/20,seq_count[i] } 
     } 

     printf "\nseq = all : %8s (count)\n",line_count 
    } 
' file.txt 


# the output: 
================= 1 2 3 4 5 6 7 a b c d e f 
================= 1 2 3 4 5 6 0.6 a b c 
matching seq : 0.60 
================= 1 2 3 4 5 6 0.57 a b c d e f g h i j 
matching seq : 0.60 
================= 1 2 3 4 5 6 1 a b c d e f g 
matching seq : 1.00 
================= 1 2 3 4 5 6 0.21 a b 
matching seq : 0.25 
================= 1 2 3 4 5 6 0.02 x y z 
matching seq : 0.05 
================= 1 2 3 4 5 6 0.00 x y z l j k 
matching seq : 0.00 
================= 

seq = 0.00 :  1 (count) 
seq = 0.05 :  1 (count) 
seq = 0.25 :  1 (count) 
seq = 0.60 :  2 (count) 
seq = 1.00 :  1 (count) 

seq = all :  6 (count)

BEGIN { line_count=0 }：初始化总行c ounter
print声明仅仅用于调试目的;因为它的加工
for (i=0; i<=20; i++)：根据实施，awk一些版本可能在序列舍入/准确性问题与非整数（例如，由0.05增加），因此我们在我们的序列中使用完整整数，然后除以20（对于这种特殊情况），以便在后续测试中为我们提供0.05增量。
$7 <= i/20：if field＃7小于或等于（i/20）。 ..
printf "matching seq ...：打印，我们对刚才匹配序列值（i/20）
line_count++：加 '1'，我们总的行计数器
seq_count[i]++：加“1”，我们的序列计数器阵列
next：打破我们的序列循环的（因为我们发现我们的匹配序列值（i/20），和处理该文件中的下一行
END ...：打印出我们的线计数
for (x=1; ...)/if/printf：循环通过我们的序列的阵列，打印行数对于每个序列（I/20）
printf "\nseq = all...：打印出我们的总线计数

注意：一些awk代码可以进一步减少，但我会保留原样，因为如果您是awk的新手，它会更容易理解。 100％awk溶液

一（明显？）的好处是，我们的序列/循环结构是内部awk从而使我们能够自己限制于通过输入文件（文件中的一个循环。文本）;当序列/循环构造在awk之外时，我们发现自己不得不为每次通过序列/循环处理一次输入文件（例如，对于该练习，我们将不得不处理输入文件21次!!!）。

来源

2017-08-30 16:38:18 markp

使用一些猜测为你真正想要完成什么的，我想出了这个：

awk '{ for (i=20; 20*$7<=i && i>0; i--) bucket[i]++ } 
    END { for (i=1; i<=20; i++) print bucket[i] " lines where $7 <= " i/20 }'

与模拟数据从mark's second answer我得到这样的输出：

2 lines where $7 <= 0.05 
2 lines where $7 <= 0.1 
2 lines where $7 <= 0.15 
2 lines where $7 <= 0.2 
3 lines where $7 <= 0.25 
3 lines where $7 <= 0.3 
3 lines where $7 <= 0.35 
3 lines where $7 <= 0.4 
3 lines where $7 <= 0.45 
3 lines where $7 <= 0.5 
3 lines where $7 <= 0.55 
5 lines where $7 <= 0.6 
5 lines where $7 <= 0.65 
5 lines where $7 <= 0.7 
5 lines where $7 <= 0.75 
5 lines where $7 <= 0.8 
5 lines where $7 <= 0.85 
5 lines where $7 <= 0.9 
5 lines where $7 <= 0.95 
6 lines where $7 <= 1

来源

2017-08-31 04:54:45 tripleee

传递循环使用非整数awk

回答

相关问题