2012-08-08 65 views
1

文件我有根据设定值以下数据拆分基于一个标准

.domain bag 
.set bag1 
bag1 
abc1 
.set bag2 
bag2 
abc2 
.domain cat 
.set bag1:cat 
bag1:cat 
abc1:cat 
.set bag2:cat 
bag2:cat 
abc2:cat 

我想将这个文件分成两个(bag1.txt和bag2.txt)的文件。

bag1.txt应该是这样的:

.domain bag 
.set bag1 
bag1 
abc1 
.domain cat 
.set bag1:cat 
bag1:cat 
abc1:cat 

bag2.txt应该是这样的:

.domain bag 
.set bag2 
bag2 
abc2 
.domain cat 
.set bag2:cat 
bag2:cat 
abc2:cat 

。域线是这两个文件常见。

我试过下面的命令,但它不工作。

nawk '{if($0~/.set/){split($2,a,":");filename=a[1]".text"}if(filename=".text"){print|"tee *.text"}else{print >filename}}' file.txt 

回答

3

方式一:

awk ' 
    BEGIN { 
     ## Split fields with spaces and colon. 
     FS = "[ :]+"; 

     ## Extension of output files. 
     ext = ".txt"; 
    } 

    ## Write lines that begin with ".domain" to all known output files (saved 
    ## in "processed_bags"). Also save them in the "domain" array to copy them 
    ## later to all files not processed yet. 
    $1 == ".domain" { 

     for (b in processed_bags) { 
      print $0 >> sprintf("%s%s", b, ext); 
     } 

     domain[ i++ ] = $0; 

     next; 
    } 

    ## Select output file to write. If not found previously, copy all 
    ## domains saved until now. 
    $1 == ".set" { 
     bag = $2; 
     if (! (bag in processed_bags)) { 
      for (j = 0; j < i; j++) { 
       print domain[j] >> sprintf("%s%s", bag, ext); 
      } 
      processed_bags[ bag ] = 1;    
     } 
    } 

    ## A normal line of data (neither ".domain" nor ".set"). Copy 
    ## to the file saved in "bag" variable. 
    bag { 
     print $0 >> sprintf("%s%s", bag, ext); 
    } 
' file.txt 

运行下面的命令来查看输出:

head bag[12].txt 

输出:

==> bag1.txt <==                                                        
.domain bag                                                         
.set bag1                                                          
bag1                                                           
abc1                                                           
.domain cat                                                         
.set bag1:cat                                                         
bag1:cat 
abc1:cat 

==> bag2.txt <== 
.domain bag 
.set bag2 
bag2 
abc2 
.domain cat 
.set bag2:cat 
bag2:cat 
abc2:cat 
+0

这是ok.But我们可以概括的一部分共同线?如果有很多包? like bag1 .... bag1000.how我可以做到这一点吗?我有很多从bag1到bag1000的文件,而不是print >> bag1,我们可以简单地用print> * .txt(很多空文件是已经出现在从bag1.txt到bag.txt的目录中) – Vijay 2012-08-08 12:06:30

+0

@peter:我已经编辑了答案来概括它。这是完全评论,你可以看到它是否符合你的需求,因为我不明白你的意思是什么'print >> * .txt' – Birei 2012-08-08 13:41:08