2017-10-12 72 views
-1

同一行转换单行数据为多

ITEM1 12-Oct-2017 DAVID BRYCE 12-Oct-2017 Sold 400,000 0.410 1.37 0.97 2.34 ITEM2 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Shipped 4,350,000 0.045 11.31 4.88 16.19 ITEM2 12-Oct-2017 DAVID BRYCE 09-Oct-2017 Shipped 2,900,000 0.045 11.31 4.88 16.19 ITEM1 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Sold 2,200,000 0.045 11.31 4.88 16.19 

我怎么能做到这一点在bash,这样我可以格式化为CSV格式,这样我可以在电子表格进一步处理中考虑这个长的输入?

样品所需的输出:

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19 
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19 
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19 
+0

你尝试过什么? – anubhava

+0

两个ISO日期之间是否有名称字段? – dawg

+0

我试图写一个循环来寻找格式的变化,它不是优雅的。因此,试着看看自从我对sed或awk的知识是否有其他更优雅的方法是非常有限的。 – dctw

回答

0

这应该做的工作。

sed 's/ITEM/\nITEM/g' input.txt | sed '/^$/d' | awk '{ print $1"|"$2"|"$3" "$4"|"$5"|"$6"|"$7"|"$8"|"$9"|"$10}' 

问候!

+1

这个名称有时有3个部分,有时2个部分... –

+0

是的,你是对的我起初没有注意到,那么 –

2

扩展GNU sed的方法(对于当前输入):

sed -E 's/ +(ITEM[0-9]+)/\n\1/g; s/ ([0-9])/|\1/g; s/([0-9]) /\1|/g;' file 

输出:

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19 
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19 
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19 

----------

附加条件解决方案:“Wha t如果第一个字段是一个任意的单词? ?例如,文件,订书机,笔,笔记本

样品file内容:

FILE 12-Oct-2017 DAVID BRYCE 12-Oct-2017 Sold 400,000 0.410 1.37 0.97 2.34 STAPLER 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Shipped 4,350,000 0.045 11.31 4.88 16.19 PEN 12-Oct-2017 DAVID BRYCE 09-Oct-2017 Shipped 2,900,000 0.045 11.31 4.88 16.19 NOTEBOOK 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Sold 2,200,000 0.045 11.31 4.88 16.19 

sed -E 's/([0-9]+\.[0-9]+) +([A-Z]+)/\1\n\2/g; s/ ([0-9])/|\1/g; s/([0-9]) /\1|/g;' file 

输出:

FILE|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
STAPLER|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19 
PEN|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19 
NOTEBOOK|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19 
+0

不错。第一个字段是什么是一个任意的字?例如,FILE,STAPLER,PEN,NOTEBOOK? – dctw

+0

@dctw,你有我的奖金解决方案 – RomanPerekhrest

+0

啊..非常感谢你。我会根据你的提示工作 – dctw

0

AWK一个-liner。

如果你有GNU-awk中那么你可以使用这个,因为它支持多RS

$ awk -v RS="ITEM" 'FNR>1{a=""; printf RS$1"|"$2"|"; for(i=3; i<=NF-10+2; i++){a=a$i" "}; printf a$i; while(i++<NF) printf "|"$i; printf "\n"}' file 

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19 
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19 
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19 

我们使用ITEM这里记录分隔符。

解决方案 - 2

$ awk -v RS="ITEM" 'FNR>1{printf RS$1"|"$2"|"$3; for(i=4; i<=NF; i++) {k=(NF>10 && i<=NF-7) ? " " : "|"; printf k$i} printf "\n"}' file 

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19 
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19 
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19 
0

sed/awk

$ sed 's/ ITEM/\nITEM/g' file | 
    awk -v OFS="|" 'NF>10{for(i=4;i<=3+NF-10;i++) {$3=$3 FS $i; $i=$(i+(NF-10))}}1' 

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19 
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19 
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19 
+1

GNU sed只与'\ n' – dawg