2017-10-05 109 views
3

想要将基于第一列$1的行合并到行中并格式化输出。在打印标题时需要生成Max Unique count of first field. 例如,安哥拉出现count = 3,巴西出现count = 5,赞比亚出现count = 1。 字段$ 1的最大唯一计数为5,因此需要打印标题5次才能为所有字段设置适当的标题。awk根据列合并行

虽然打印输出,想保留original input file一行订单。 曾经是我的实际输入文件变化等10个字段,12个字段等

Input.csv

Country,Network,Details,Amount 
Angola,voda,xxx,10 
Angola,at&t,xxx,20 
Angola,mtn,xxx,30 
Brazil,voda,yyy,40 
Brazil,voda,yyy,50 
Brazil,at&t,yyy,60 
Brazil,mtn,yyy,70 
Brazil,voda,yyy,80 
Zambia,tcl,zzz,90 

期望Output.csv

Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount 
Angola,voda,xxx,10,Angola,at&t,xxx,20,Angola,mtn,xxx,30 
Brazil,voda,yyy,40,Brazil,voda,yyy,50,Brazil,at&t,yyy,60,Brazil,mtn,yyy,70,Brazil,voda,yyy,80 
Zambia,tcl,zzz,90 

目前,我使用下面2命令来获取所需的输出,并根据实际输入文件中的字段数量手动更改计数。

步骤:#1

awk 'BEGIN { while (count++<5) header=header "Country,Network,Details,Amount,"; print header }' > output.csv 

步骤:#2

awk -F, ' 
    /.+/{ 
     if (!($1 in Val)) { Key[++i] = $1; } 
     Val[$1] = Val[$1] $0 ","; 
    } 
    END{ 
     for (j = 1; j <= i; j++) { 
      print(Val[Key[j]]); 
     } 
    }' input.csv >> output.csv 

寻找你的建议...

+0

你可以保持像'ORDERNUM [$ 1]数组++'并补充说,作为外环带动您的最终打印语句,但为什么不直接使用'的awk“{现有PROG}” |排序“(因为你的输入数据似乎是按国名排序)?祝你好运。 – shellter

+0

而且......用一个写得很好的Q加上小样本数据,需要的输出和......喘气,一些非常接近的代码!继续发帖,祝你好运! – shellter

回答

4

awk一衬垫:

awk 'BEGIN{FS=OFS=","}FNR==1{n=$0;next}{a[$1]=($1 in a ? a[$1] OFS:"")$0; if(!($1 in b)){o[++i]=$1}; b[$1]++; mx=mx>b[$1]?mx:b[$1] }END{for(i=1; i<=mx; i++)printf("%s%s",n,i==mx?RS:OFS); for(i=1; i in o; i++)print a[o[i]]}' infile 

输入:

$ cat infile 
Country,Network,Details,Amount 
Angola,voda,xxx,10 
Angola,at&t,xxx,20 
Angola,mtn,xxx,30 
Brazil,voda,yyy,40 
Brazil,voda,yyy,50 
Brazil,at&t,yyy,60 
Brazil,mtn,yyy,70 
Brazil,voda,yyy,80 
Zambia,tcl,zzz,90 

输出:

$ awk 'BEGIN{FS=OFS=","}FNR==1{n=$0;next}{a[$1]=($1 in a ? a[$1] OFS:"")$0; if(!($1 in b)){o[++i]=$1}; b[$1]++; mx=mx>b[$1]?mx:b[$1] }END{for(i=1; i<=mx; i++)printf("%s%s",n,i==mx?RS:OFS); for(i=1; i in o; i++)print a[o[i]]}' infile 
Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount,Country,Network,Details,Amount 
Angola,voda,xxx,10,Angola,at&t,xxx,20,Angola,mtn,xxx,30 
Brazil,voda,yyy,40,Brazil,voda,yyy,50,Brazil,at&t,yyy,60,Brazil,mtn,yyy,70,Brazil,voda,yyy,80 
Zambia,tcl,zzz,90 

更好可读:

awk 'BEGIN{ 
      FS=OFS="," 
    } 
    FNR==1{ 
      n=$0; 
      next 
    } 
    { 
      a[$1]=($1 in a ? a[$1] OFS:"")$0; 
      if(!($1 in b)){ o[++i]=$1 }; 
      b[$1]++; 
      mx=mx>b[$1]?mx:b[$1] 
    } 
    END{ 
      for(i=1; i<=mx; i++) 
       printf("%s%s",n,i==mx?RS:OFS); 

      for(i=1; i in o; i++) 
       print a[o[i]] 
    }' infile 

发表评论:

想知道,在那里更改代码,在 打印“国家”输出只有一次,如果我不需要打印同一个国家 名第二次,第三次

$ awk 'BEGIN{FS=OFS=","}FNR==1{n=$0;next}{a[$1]=($1 in a ? a[$1] OFS substr($0,index($0,",")+1) : $0); if(!($1 in b)){o[++i]=$1}; b[$1]++; mx=mx>b[$1]?mx:b[$1] }END{for(i=1; i<=mx; i++)printf("%s%s",i==1?n:substr(n,index(n,",")+1),i==mx?RS:OFS); for(i=1; i in o; i++)print a[o[i]]}' infile 
Country,Network,Details,Amount,Network,Details,Amount,Network,Details,Amount,Network,Details,Amount,Network,Details,Amount 
Angola,voda,xxx,10,at&t,xxx,20,mtn,xxx,30 
Brazil,voda,yyy,40,voda,yyy,50,at&t,yyy,60,mtn,yyy,70,voda,yyy,80 
Zambia,tcl,zzz,90 

修改代码:

awk 'BEGIN{ 
      FS=OFS="," 
    } 
    FNR==1{ 
      n=$0; 
      next 
    } 
    { 
      # this line modified 
      # look for char pos of comma, 

      a[$1]=($1 in a ? a[$1] OFS substr($0,index($0,",")+1) : $0); 

      if(!($1 in b)){ o[++i]=$1 }; 

      b[$1]++; 
      mx=mx>b[$1]?mx:b[$1] 
    } 
    END{ 
      for(i=1; i<=mx; i++) 
       # this line modified 
       printf("%s%s",i==1?n:substr(n,index(n,",")+1),i==mx?RS:OFS); 

      for(i=1; i in o; i++) 
       print a[o[i]] 
    }' infile 

解释与修改:

  • index(in, find)

搜索字符串中的字符串中找到第一次出现,并 以0开头的字符返回位置字符串中。

  • substr(string, start [, length ])

    返回字符串的长度字符长的串,起始于 字符数目开始。

+0

非常感谢Akshay Hegde,Up-voteed !!! – SVR

+0

@RVS:更新为订购 –

+0

想知道在哪里更改代码,只在第一次打印“国家”时才输出,如果我不需要第二次打印同一国家名称,第三次 – SVR