0
A
回答
0
如果有人感兴趣。我并不满意任何建议。可能是因为我希望查看线路解决方案,而据我所知,这种解决方案并不存在。 反正我也写了一个工具,叫做ljoin(用于数据库的左连接等),其不正是我要求(当然:d)
#!/usr/bin/perl
=head1 NAME
ljoin.pl - Utility to left join files by specified key column(s)
=head1 SYNOPSIS
ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>
To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.
=head1 DESCRIPTION
This utility merges multiple file into one using specified column as a key
=head2 OPTIONS
=item --field-separator=<separator>, -fs <separator>
Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.
=item --no-sort-fields, -no-sf
Do not sort columns when creating a key for merging files
=item --complex-key-separator=<separator>, -ks <separator>
Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.
=item --no-sort-complex-keys, -no-sk
Do not sort complex column values when creating a key for merging files
=item --include-primary-field, -i
Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.
=item --primary-field-index=<index>, -f <index>
Specifies index of the column which should be used for matching lines. You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"
=item --help, -?
Get help and documentation
=cut
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my @primaryFieldIndexes;
GetOptions(
"field-separator|fs=s" => \$fieldSeparator,
"sort-fields|sf!" => \$sortFields,
"complex-key-separator|ks=s" => \$complexKeySeparator,
"sort-complex-keys|sk!" => \$sortComplexKeys,
"contains-titles|t!" => \$containsTitles,
"include-primary-field|i!" => \$includePrimaryField,
"primary-field-index|[email protected]" => \@primaryFieldIndexes,
"help|?!" => sub { pod2usage(0) }
) or pod2usage(2);
pod2usage(0) if $#ARGV < 1;
push @primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;
my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
$primaryFieldIndexesHash{$i} = 1;
}
print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = @primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";
my $fieldsCount = 0;
my %keys_hash =();
my %files =();
my %titles =();
# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1))
{
# Find files with specified pattern
my $filePattern = $ARGV[$argnum];
my @matchedFiles = < $filePattern >;
foreach my $inputPath (@matchedFiles)
{
open INPUT_FILE, $inputPath or die $!;
my %lines;
my $lineNumber = -1;
while (my $line = <INPUT_FILE>)
{
next if $containsTitles && $lineNumber == 0;
# Don't use chomp line. It doesn't handle unix input files on windows and vice versa
$line =~ s/[\r\n]+$//g;
# Skip lines that don't have columns
next if $line !~ m/($fieldSeparator)/;
# Split fields and count them (store maximum number of columns in files for later use)
my @fields = split($fieldSeparator, $line);
$fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;
# Sort complex key
my @multipleKey;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
my @complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
@complexKey = sort(@complexKey) if $sortFields;
push @multipleKey, join($complexKeySeparator, @complexKey)
}
# sort multiple keys and create key string
@multipleKey = sort(@multipleKey) if $sortFields;
my $fullKey = join $fieldSeparator, @multipleKey;
$lines{$fullKey} = \@fields;
$keys_hash{$fullKey} = 1;
}
close INPUT_FILE;
$files{$inputPath} = \%lines;
}
}
# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my @keys = sort keys(%keys_hash);
# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
print OUTPUT_FILE $fieldSeparator;
}
# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my @matchedFiles = < $filePattern >;
foreach my $inputPath (@matchedFiles)
{
print OUTPUT_FILE $inputPath;
for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
{
print OUTPUT_FILE $fieldSeparator;
}
}
}
# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key (@keys)
{
print OUTPUT_FILE $key;
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my @matchedFiles = < $filePattern >;
foreach my $inputPath (@matchedFiles)
{
my $lines = $files{$inputPath};
for(my $i = 0; $i < $fieldsCount; $i++)
{
next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
print OUTPUT_FILE $fieldSeparator;
print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
}
}
}
print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;
0
不适用排序文件对于任何选美比赛,这似乎接近:
#!/bin/bash
while read one two; do
one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e '
1 {h; d}
$! {H; d}
H; g; s/\n/,/g;
'`
echo $one $two
done | sort
0
更改内部字段分隔符,然后com用“>”删除前两个字母:
(
IFS=" ,";
while read a b n; do
if [ "$a" \> "$b" ]; then
echo "$b,$a $n";
else
echo "$a,$b $n";
fi;
done;
) <<EOF | sort
A,C 1
C,B 2
B,A 3
EOF
相关问题
- 1. 关键部分无法在onSensorChanged()
- 2. python部分与关键字参数
- 3. 信号量:关键部分与优先
- 4. MPI中的关键部分?
- 5. 大量关键部分
- 6. 什么是关键部分?
- 7. 并行MSBUILD - 关键部分?
- 8. 部分关键字搜索
- 9. 关键部分队列
- 10. GridLookUpEdit和多部分键
- 11. 与无关成分
- 12. 查找从关键字到关键字的字符串部分
- 13. ip扫描器多线程中的关键部分
- 14. Windows - SQLite的活动关键部分
- 15. Windows 7中的关键部分问题
- 16. GAE的关键部分探索
- 17. 锈排序键值地图与部分键搜索
- 18. 与Alamofire的多部分POST
- 19. 从管道脚本的一部分做关键部分
- 20. php pear Mail_mime错误的内容类型的文本/ HTML与多部分/替代与多部分/相关
- 21. php应用程序中的关键代码部分?
- 22. 与每个对象关联的关键部分是如何初始化的?
- 23. C++是关键部分无用当返回指针到数据
- 24. 多线程和关键部分使用 - C++
- 25. 多生产者/消费者和关键部分代码问题
- 26. UITableView与多个部分:部分与1行,删除行崩溃应用程序
- 27. 指定的关键部件过多; max 1部分允许cassandra引擎和mariadb
- 28. Rails belongs_to与多个外键的关联
- 29. 添加更多与外键的关联
- 30. Rails has_many与多个键的关联
尝试http://unix.stackexchange.com/。 – 2011-02-10 16:11:45
版主能否迁移它? – 2011-02-10 16:26:24