2016-11-23 65 views
1

我有一个包含在第一列任务名称以及完成任务的第二列如下时间文件:unix文件中任务的最小值和最大值?

Task2, 3421 
Task3, 3300 
Task1, 1000 
Task2, 1100 
Task3, 1200 
Task3, 1209 
Task4, 1299 
Task3, 1289 
Task1, 1389 
Task2, 1211 
Task5, 1216 
Task2, 1416 
Task1, 2100 
Task6, 2416 
Task5, 2216 
Task7, 1116 

现在,我必须找到采取每个任务的最小和最大时间并以下面的格式输出

task , maxtime , min time 

eg

Task1, 1000, 2100 (from the data given above) 
+0

见[这里](http://stackoverflow.com/a/40780716/6769931)用于准 “自由AWK-” 的答案。;) –

回答

4

您可以awk

awk ' 
    BEGIN{FS=","; OFS=", "} 
    !($1 in max) || $2>max[$1]{max[$1]=$2} 
    !($1 in min) || $2<min[$1]{min[$1]=$2} 
    END{ 
     for(k in max){print k, min[k], max[k]} 
    }' input.txt 

你试试,

Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
+0

我把这些行放在一个脚本中,并执行语法错误awk:第3行附近的语法错误012kawk:在第3行附近出错 – Vicky

+0

再次尝试,修复后 –

+0

同样的错误awk:第3行附近的语法错误 awk :在线3附近救助 – Vicky

1

另一种方式来做到这一点是通过列1,然后通过列2排序,并采取了第一个和最后一个值像这样的每个任务

awk -F, '{arr[$1]=arr[$1] $2} END {for(key in arr) print key, arr[key]}' <(sort -t 1 -k 1,2 file) | awk '{OFS=", "; print $1, $2, $NF}' 

样品运行:

$ cat file 
Task2, 3421 
Task3, 3300 
Task1, 1000 
Task2, 1100 
Task3, 1200 
Task3, 1209 
Task4, 1299 
Task3, 1289 
Task1, 1389 
Task2, 1211 
Task5, 1216 
Task2, 1416 
Task1, 2100 
Task6, 2416 
Task5, 2216 
Task7, 1116 
$ sort -t 1 -k 1,2 file 
Task1, 1000 
Task1, 1389 
Task1, 2100 
Task2, 1100 
Task2, 1211 
Task2, 1416 
Task2, 3421 
Task3, 1200 
Task3, 1209 
Task3, 1289 
Task3, 3300 
Task4, 1299 
Task5, 1216 
Task5, 2216 
Task6, 2416 
Task7, 1116 
$ awk -F, '{arr[$1]=arr[$1] $2} END {for(key in arr) print key, arr[key]}' <(sort -t 1 -k 1,2 file) | awk '{OFS=", "; print $1, $2, $NF}' 
Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
+0

为什么task4有相同的最小值和最大值? – Vicky

+0

@ user3369871因为任务4只有1个条目 – ritesht93

+1

对于任务7也只有一个条目,但输出中的task7缺少最大时间 – Vicky

1

使用gawkarray of arrays

gawk 'BEGIN{OFS=FS=","} 
     $2>a[$1]["max"]{a[$1]["max"]=$2} 
     $2<a[$1]["min"] || !a[$1]["min"] {a[$1]["min"]=$2} 
     END {for (i in a){ 
      print i, a[i]["min"],a[i]["max"] 
      } 
     }' file 

here

1

这里是另一替代

$ join -t, <(sort file){,} | sort -k1,1 -k2n -k3nr | rev | uniq -2 | rev 
0

sort它上的第一和第二列,然后在awk它。这个解决方案(awk部分)的好处在于它不会将数据存储在内存中并最终将其转储出去,而是一旦找到新数据就会输出以前的$1的数据。在这里:

$ sort -t, -k1 foo -k2n | \      # sort 
awk '!($1 in min) {min[$1]=$2}    # first of each is always min (and max) 
     ($1 in min) {max[$1]=$2}    # every current one is always max 
     $1!=p && NR>1 {print p, min[p], max[p]} # if $1 differs from previous, print previous 
        {p=$1}      # p is current for next round 
     END   {print p, min[p], max[p]}' # dump buffer 
Task1, 1000 2100 
Task2, 1100 3421 
Task3, 1200 3300 
Task4, 1299 1299 
Task5, 1216 2216 
Task6, 2416 2416 
Task7, 1116 1116 
1

使用sortsedawk

sort -k1,1 -k2n input.txt | sed -r ':a;N;$!ba;:b;s/(Task[0-9]+,)([0-9 ,]+)\n?\1([0-9]+)/\1\2, \3/g;tb;' | awk 'BEGIN{FS=OFS=", ";}{print $1, $2, $NF}' 

使用sortsed替代解决方案的另一个答案只有

sort -k1,1 -k2n input.txt | sed -r ':a;N;$!ba;:b;s/(Task[0-9]+,)([0-9 ,]+)\n?\1([0-9]+)/\1\2, \3/g;tb;' | sed -r -e 's/^([^ ]+)\s([^ ]+)\s.*\s([^ ]+)/\1 \2 \3/' -e 's/^([^ ]+)\s([^ ]+)$/\1 \2, \2/' 

你,

Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
0

这主要是bash,如果你有这方面的问题,我可以用别的东西替代awk命令......(例如,如果时间始终在同一列中,则为colrm)。

# Keep a list of already processed task names 
already_processed="" 

# Use read to read only the first column from the data file 
while IFS=',' read -ra task; do 
    # If the task has already been processed, skip it and go to the next line 
    if echo "$already_processed" | grep $task > /dev/null; then 
    continue 
    else 
    # Select all the task with the same name from the data file, take the 
    #+second column and sort it to find the max and the minimum. 
    MIN=`grep $task $1 | awk -F',' '{print $2}' | sort -n | head -1` 
    MAX=`grep $task $1 | awk -F',' '{print $2}' | sort -n | tail -1` 
    # Add the task to the "already_processed" tasks (to be sure each task will 
    #+appear only once in the output 
    already_processed="$already_processed:$task" 
    # Print the output in the wanted format. 
    echo "${task}, ${MIN}, ${MAX}" 
    fi 

done < $1 

只要确保您的数据文件以空行结束。

实施例:

bash <name_of_script_file> <name_of_data_file> | sort  
Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
相关问题