2012-07-20 74 views
0

我有 文件1:比较文件和打印类

id position 
a1 21 
a1 39 
a1 77 
b1 88 
b1 122 
c1 22 

文件2

id class position1 position2 
a1 Xfact 1   40 
a1 Xred 41   66 
a1 xbreak 69   89 
b1 Xbreak 77   133 
b1 Xred 140   199 
c1 Xfact 1   15 
c1 Xbreak 19   35 

我想是这样的 输出:

id position class 
a1 21  Xfact 
a1 39  Xfact 
a1 77  Xbreak 
b1 88  Xbreak 
b1 122  Xbreak 
c1 22  Xbreak 

我需要一个简单的awk脚本,它从file1中打印id和位置,从file1中获取位置并将其与文件2进行比较位置。如果文件1中的位置在文件2中的位置1和2的范围内。打印相应的类

+0

这是功课?它看起来很夸张。 – Vatine 2012-07-20 09:58:04

回答

0

单向使用awk。这不是一个简单的脚本。该过程简而言之:关键点是变量'all_ranges',当重置从范围文件中读取保存其数据的范围时,当设置时,停止该过程并开始从'id-位置' 文件读取,检查位置在数组中的数据和打印如果匹配的范围。我试图避免多次处理范围文件,并通过块来完成,这使得它更加复杂。

编辑补充一点,我假设id字段在这两个文件进行排序。否则,这个脚本会失败,你需要另一种方法。的script.awk

内容:

BEGIN { 
    ## Arguments: 
    ## ARGV[0] = awk 
    ## ARGV[1] = <first_input_argument> 
    ## ARGV[2] = <second_input_argument> 
    ## ARGC = 3 
    f2 = ARGV[ --ARGC ]; 

    all_ranges = 0 

    ## Read first line from file with ranges to get 'class' header. 
    getline line <f2 
    split(line, fields) 
    class_header = fields[2]; 
} 

## Special case for the header. 
FNR == 1 { 
    printf "%s\t%s\n", $0, class_header; 
    next; 
} 

## Data. 
FNR > 1 { 

    while (1) { 

     if (! all_ranges) { 

      ## Read line from file with range positions. 
      ret = getline line <f2 

      ## Check error. 
      if (ret == -1) { 
       printf "%s\n", "ERROR: " ERRNO 
       close(f2); 
       exit 1; 
      } 

      ## Check end of file. 
      if (ret == 0) { 
       break; 
      } 

      ## Split line in spaces. 
      num = split(line, fields) 
      if (num != 4) { 
       printf "%s\n", "ERROR: Bad format of file " f2; 
       exit 2; 
      } 

      range_id = fields[1]; 
      if ($1 == fields[1]) { 
       ranges[ fields[3], fields[4] ] = fields[2]; 
       continue; 
      } 
      else { 
       all_ranges = 1 
      } 
     } 

     if (range_id == $1) { 
      delete ranges; 
      ranges[ fields[3], fields[4] ] = fields[2]; 
      all_ranges = 0; 
      continue; 
     }   

     for (range in ranges) { 
      split(range, pos, SUBSEP) 
      if ($2 >= pos[1] && $2 <= pos[2]) { 
       printf "%s\t%s\n", $0, ranges[ range ]; 
       break; 
      } 
     } 
     break; 
    } 
} 

END { 
    for (range in ranges) { 
     split(range, pos, SUBSEP) 
     if ($2 >= pos[1] && $2 <= pos[2]) { 
      printf "%s\t%s\n", $0, ranges[ range ]; 
      break; 
     } 
    } 
} 

运行它想:

awk -f script.awk file1 file2 | column -t 

有了结果如下:

id position class 
a1 21  Xfact 
a1 39  Xfact 
a1 77  xbreak 
b1 88  Xbreak 
b1 122  Xbreak 
c1 22  Xbreak