2012-07-25 56 views
1

您好我想使用AWK或Perl来获取以下格式的输出文件。我的输入文件是一个空格分隔的文本文件。这与我的早期问题类似,但在这种情况下,输入和输出没有格式。我列的位置可能改变,将不胜感激不引用技术列数使用AWK或Perl转置

输入文件

id quantity colour shape size colour shape size colour shape size 
1 10 blue square 10 red triangle 12 pink circle 20 
2 12 yellow pentagon 3 orange rectangle 4 purple oval 6 

所需的输出

id colour shape size 
1 blue square 10 
1 red triangle 12 
1 pink circle 20 
2 yellow pentagon 3 
2 orange rectangle 4 
2 purple oval 6 

我使用此代码由丹尼斯·威廉姆森。唯一的问题是我得到的输出在转置字段中没有空间分隔。我需要一个空间分隔

#!/usr/bin/awk -f 
BEGIN { 
col_list = "quantity colour shape" 
# Use a B ("blank") to add spaces in the output before or 
# after a format string (e.g. %6dB), but generally use the numeric argument 

# columns to be repeated on multiple lines may appear anywhere in 
# the input, but they will be output together at the beginning of the line 
repeat_fields["id"] 
# since these are individually set we won't use B 
repeat_fmt["id"] = "%-1s " 
# additional fields to repeat on each line 

ncols = split(col_list, cols) 

for (i = 1; i <= ncols; i++) { 
    col_names[cols[i]] 
    forms[cols[i]] = "%-1s" 
} 
} 


# save the positions of the columns using the header line 
FNR == 1 { 
for (i = 1; i <= NF; i++) { 
    if ($i in repeat_fields) { 
     repeat[++nrepeats] = i 
     repeat_look[i] = i 
     rformats[i] = repeat_fmt[$i] 
    } 
    if ($i in col_names) { 
     col_nums[++n] = i 
     col_look[i] = i 
     formats[i] = forms[$i] 
    } 
} 
# print the header line 
for (i = 1; i <= nrepeats; i++) { 
    f = rformats[repeat[i]] 
    sub("d", "s", f) 
    gsub("B", " ", f) 
    printf f, $repeat[i] 
} 
for (i = 1; i <= ncols; i++) { 
    f = formats[col_nums[i]] 
    sub("d", "s", f) 
    gsub("B", " ", f) 
    printf f, $col_nums[i] 
} 
printf "\n" 
next 
} 

{ 
for (i = 1; i <= NF; i++) { 
    if (i in repeat_look) { 
     f = rformats[i] 
     gsub("B", " ", f) 
     repeat_out = repeat_out sprintf(f, $i) 

    } 
    if (i in col_look) { 
     f = formats[i] 
     gsub("B", " ", f) 
     out = out sprintf(f, $i) 
     coln++ 
    } 
    if (coln == ncols) { 
     print repeat_out out 
     out = "" 
     coln = 0 
    } 
} 
repeat_out = "" 
} 

输出

id quantitycolourshape 
1 10bluesquare 
1 redtrianglepink 
2 circle12yellow 
2 pentagonorangerectangle 

我道歉,不包括对实际的文件之前的所有信息。我只是为了简单而做到了这一点,但并没有达到我的所有要求。

在我的实际文件我期待转,你的真实数据包含超过5000列N_CELL领域和n_bsc节点SITE儿童

NODE SITE CHILD n_cell n_bsc 

Here is a link to the actual file I am working on

+4

语言的名称是 “Perl的”,而不是 “PERL”。 – ikegami 2012-07-25 22:32:15

+1

但是它是“AWK”。我对这个问题的回答将与[我对您以前的问题的回答]相同(http://stackoverflow.com/a/11454983/26428)。 – 2012-07-26 00:12:39

+1

[Transpose using AWK]的可能的重复(http://stackoverflow.com/questions/11447885/transpose-using-awk) – dgw 2012-07-26 07:40:48

回答

3
<>; 
print("id colour shape size\n"); 

while (<>) { 
    my @combined_fields = split; 
    my $id = shift(@combined_fields); 
    while (@combined_fields) { 
     my @fields = ($id, splice(@combined_fields, 0, 3)); 
     print(join(' ', @fields), "\n"); 
    } 
} 
+0

我该如何运行? – 2012-07-25 22:42:15

+0

'perl script.pl infile> outfile'或in-place:'perl -i script.pl文件' – ikegami 2012-07-25 23:28:49

+0

我的实际输入文件有超过5k列,所以想要使用标题行参考固定列和列转置列ID的问题 – 2012-07-26 07:50:07

0

你告诉我们,其专栏位置可能会改变,我恐怕这还不够。

因此,如果没有任何适当的信息,我已经写了这个,它使用标题行来计算数据集的数量和大小,其中id列在哪里,第一组在哪列开始。

它对您的示例数据正常工作,但我只能猜测它是否会在您的活动文件上工作。

use strict; 
use warnings; 

my @headers = split ' ', <>; 

my %headers; 
$headers{$_}++ for @headers; 

die "Expected exactly one 'id' column" unless $headers{id} // 0 == 1; 
my $id_index = 0; 
$id_index++ while $headers[$id_index] ne 'id'; 

my @labels = grep $headers{$_} > 1, keys %headers; 
my $set_size = @labels; 
my $num_sets = $headers{$labels[0]}; 

my $start_index = 0; 
$start_index++ while $headers[$start_index] ne $labels[0]; 

my @reformat; 

while (<>) { 
    my @fields = split; 
    next unless @fields; 
    my $id = $fields[$id_index]; 
    for (my $i = $start_index; $i < @fields; $i+=$set_size) { 
    push @reformat, [ $id, @fields[$i..$i + $set_size - 1] ]; 
    } 
} 

unshift @labels, 'id'; 
print "@labels\n"; 
print "@$_\n" for @reformat; 

输出

id colour shape size 
1 blue square 10 
1 red triangle 12 
1 pink circle 20 
2 yellow pentagon 3 
2 orange rectangle 4 
2 purple oval 6