除非你喜欢疼痛,否则使用Text::CSV
及其亲属Text::CSV_XS
和Text::CSV_PP
。
但是,这可能是这个问题的更容易的部分。一旦阅读并验证了该行是完整的,则需要将相关信息添加到正确键入的散列。你可能也必须非常熟悉参考文献。
您可以创建分支键入的散列%BranchData
。该散列的每个元素都是对按作业键入的散列的引用;并且其中的每个元素都是对由timePeriod键入的散列的引用,并且其中的每个元素都将引用按天数键入的数组(使用索引1..7;它稍微分配空间,但获得它是正确的更大;不要混淆$[
虽然!)。并且数组中的每个元素都将是对由三个句点类型键入的散列的引用。哎哟!
如果一切运作良好,一个典型的分配可能是这样的:
$BranchData{$row{branch}}->{$row{job}}->{$row{period}}->[1]->{$row{p_type}} +=
$row{day1};
你会迭代元素1..7和“DAY1” ..“第7天”;对于那里的设计工作有一些清理。
你不得不担心正确地初始化东西(或者你没有 - Perl会为你做)。我假设该行作为直接散列(而不是散列引用)返回,并带有分支,作业,句点,句点类型(p_type
)和每天('day1',..'day7')的键。 。
如果您事先知道需要哪一天,您可以避免累积所有日子,但它可以使得更一般的报告更简单地随时读取和累积所有数据,然后只需打印处理任何子集的整个数据需要处理。
这是足够有趣的问题,我已经黑了这个代码。我怀疑它是否是最佳的,但它确实有效。
#!/usr/bin/env perl
#
# SO 8570488
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
use constant debug => 0;
my $file = "input.csv";
my $csv = Text::CSV->new({ binary => 1, eol => $/ })
or die "Cannot use CSV: ".Text::CSV->error_diag();
my @headings = qw(branch job period p_type day1 day2 day3 day4 day5 day6 day7);
my @days = qw(day0 day1 day2 day3 day4 day5 day6 day7);
my %BranchData;
open my $in, '<', $file or die "Unable to open $file for reading ($!)";
$csv->column_names(@headings);
while (my $row = $csv->getline_hr($in))
{
print Dumper($row) if debug;
my %r = %$row; # Not for efficiency; for notational compactness
$BranchData{$r{branch}} = { } if !defined $BranchData{$r{branch}};
my $branch = $BranchData{$r{branch}};
$branch->{$r{job}} = { } if !defined $branch->{$r{job}};
my $job = $branch->{$r{job}};
$job->{$r{period}} = [ ] if !defined $job->{$r{period}};
my $period = $job->{$r{period}};
for my $day (1..7)
{
# Assume that Overtime, Regular and Variance are the only types
# Otherwise, you need yet another level of checking whether elements exist...
$period->[$day] = { Overtime => 0, Regular => 0, Variance => 0} if !defined $period->[$day];
$period->[$day]->{$r{p_type}} += $r{$days[$day]};
}
}
print Dumper(\%BranchData);
鉴于你的样本数据,从这个输出是:
$VAR1 = {
'West' => {
'Electrician' => {
'12PM-5PM' => [
undef,
{
'Regular' => '4.25',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => '-1.25',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => '-1.5',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => '-1.5',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
}
]
}
},
'South' => {
'Manager' => {
'12A-9AM' => [
undef,
{
'Regular' => 0,
'Overtime' => '77.75',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '14.75',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
}
]
}
},
'North' => {
'Janitor' => {
'5PM-12AM' => [
undef,
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-4.25'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-1.25'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-1.5'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-1.5'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
}
]
}
},
'East' => {
'Banker' => {
'9AM-12PM' => [
undef,
{
'Regular' => 0,
'Overtime' => '4.25',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '1.25',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '1.5',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '1.5',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
}
]
}
}
};
有乐趣从这里走了!
另一个值得考虑的模块是'Text :: CSV :: Encoded',我用它来处理UTF-8。 – reinierpost 2011-12-20 09:34:12
我相信这段代码可以满足我的需求!我只需要以下列格式输出到另一个CSV文件:
South,Manager,12A-9AM,77.75,14.75,16
在上面的行中,最后3个值表示三种periodTypes(加班,常规和差异)day1Values。 – user1107055 2011-12-20 17:06:08