如何在我处理后从Perl数组中删除元素？

我正在读取一个postfix邮件日志文件到一个数组，然后通过循环来提取消息。在第一遍时，我正在检查“to =”行上的匹配并获取消息ID。在构建MSGID数组后，我循环遍历数组以提取to =，from =和client =行中的信息。如何在我处理后从Perl数组中删除元素？

我希望做的是从数组中提取数据以便快速处理数据（即少一行检查），从数组中删除一行。

有什么建议吗？这是在Perl中。

编辑：下面gbacon的答案是足以让我的固溶体滚动。下面是它的胆量：

my %msg; 
while (<>) { 
    my $line = $_; 
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) { 
      my $key = $1; 
      push @{ $msg{$key}{$1} } => $2 
        while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g; 
    } 
    if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) { 
      my $key = $3; 
      push @{ $msg{$key}{date} } => $1; 
      push @{ $msg{$key}{server} } => $2; 
    } 
} 

use Data::Dumper; 
$Data::Dumper::Indent = 1; 
print Dumper \%msg;

我敢肯定，第二正则表达式可以更令人印象深刻，但它得到了什么，我需要做的工作。我现在可以把所有消息的哈希值抽出来，并将我感兴趣的消息抽出。

感谢所有回答。

来源

2010-02-03 Justin ᚅᚔᚈᚄᚒᚔ

在我看来，哈希可能是一个更好的方式来处理这个问题？这样，您不必在迭代时明确检查匹配。您可以简单地使用“to =”行作为关键。 – 2010-02-03 19:38:02

做它在单次通过：

#! /usr/bin/perl 

use warnings; 
use strict; 

# for demo only 
*ARGV = *DATA; 

my %msg; 
while (<>) { 
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) { 
    my $key = $1; 
    push @{ $msg{$key}{$1} } => $2 
     while /\b(to|from|client)=(.+?)(?:,|$)/g; 
    } 
} 

use Data::Dumper; 
$Data::Dumper::Indent = 1; 
print Dumper \%msg; 
__DATA__ 
Apr 8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x] 
Apr 8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<[email protected]> 
Apr 8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<[email protected]>, size=1087, nrcpt=2 (queue active) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973) 
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed 
Apr 8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1] 
Apr 8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<[email protected]> 
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<[email protected]>, size=1636, nrcpt=2 (queue active) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]om> Queued mail for delivery) 
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery) 
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed

代码工作由第一寻找一个队列ID（例如，BA1CE38965和62D8438973以上），这是我们在$key存储。

接下来，我们找到当前行上的所有匹配（感谢/g开关），看起来像to=<...>，client=mail.example.com等等 - 带和不带分隔逗号。

在图案

值得注意的是

\b - 匹配to或from或client
(.+?) - - 该字段的值与相匹配的字边界只（防止匹配xxxto=<...>）
(to|from|client)上匹配非贪婪的量词
(?:,|$) - 匹配逗号或字符串结尾从捕获到$3

非贪婪(.+?)迫使比赛停止在它遇到，而不是最后的第一个逗号。否则，在一条线上

to=<[email protected]>, other=123

你会得到<[email protected]>, other=123作为收件人！

然后对于匹配的每个字段，我们push它将其放到数组的末尾（因为可能有多个收件人）连接到队列ID和字段名称。看看结果：

$VAR1 = { 
    '62D8438973' => { 
    'client' => [ 
     'localhost.localdomain[127.0.0.1]' 
    ], 
    'to' => [ 
     '<[email protected]>', 
     '<[email protected]>' 
    ], 
    'from' => [ 
     '<[email protected]>' 
    ] 
    }, 
    'BA1CE38965' => { 
    'client' => [ 
     'mail.example.com[x.x.x.x]' 
    ], 
    'to' => [ 
     '<[email protected]>', 
     '<[email protected]>' 
    ], 
    'from' => [ 
     '<[email protected]>' 
    ] 
    } 
};

现在说要打印所有的消息，其队列ID是BA1CE38965收件人：

my $queueid = "BA1CE38965"; 
foreach my $recip (@{ $msg{$queueid}{to} }) { 
    print $recip, "\n": 
}

也许你只想知道有多少收件人：

print scalar @{ $msg{$queueid}{to} }, "\n";

如果你愿意承担每个消息都只有一个客户端，与

012访问

来源

2010-02-03 20:00:50

这真是太棒了，谢谢...我只专注于抽出我感兴趣的消息（与[0-9 - ] @ ACertainDomain.com相匹配的消息），并没有考虑只加载所有将文件中的相关信息转换为散列，然后将消息从中拉出。我打算用你的代码作为基础，看看我不能从那里建立起来。我相信我会有更多的问题（我仍然试图解析这个'虽然'正则表达式，我在这个生锈的）。 – 2010-02-03 21:04:06

@Justin不客气！查看更新的说明。 – 2010-02-03 21:33:55

再次感谢。我的解析现在每个文件大约需要3分钟，而不是3个小时。这个社区真棒。 – 2010-02-03 23:59:03

它实际上并不会使处理速度更快，因为从阵列中移除是一项昂贵的操作。

更好的选择：

当你创建ID数组
做的一切，包括指针（索引，真的）到主存储器阵列，让您可以快速访问它的元素为给定ID

来源

2010-02-03 19:31:35

在Perl中，您可以使用splice（）例程从数组中删除元素。

像往常一样，在数组循环时从数组中删除时要小心，因为数组索引将发生更改。

来源

2010-02-03 19:32:50

假设你已经在手的索引，使用拼接：

splice(@array, $indextoremove, 1)

但要小心。删除元素后，您的索引将无效。

来源

2010-02-03 19:34:29

用于操纵一个数组的内容常用方法：

# start over with this list for each example: 
my @list = qw(a b c d);

剪接：

splice @list, 2, 1, qw(e); 
# @list now contains: qw(a b e d)

弹出和不印字：

pop @list; 
# @list now contains: qw(a b c) 

unshift @list; 
# @list now contains: qw(b c d)

地图：

@list = map { $_ eq 'b' ?() : $_ } @list; 
# list now contains: qw(a c d);

阵列片：

@list[3..4] = qw(e f); 
# list now contais: qw(a b c e f);

为和的foreach循环：

foreach (@list) 
{ 
    # $_ is aliased to each element of the list in turn; 
    # assignments will be propogated back to the original structure 
    $_ = uc if m/[a-c]/; 
} 
# list now contains: qw(A B C d);

在阅读所有这些功能，perldoc perldata中的切片以及perldoc perlsyn中的循环。

来源

2010-02-03 19:55:03 Ether

为什么不能做到这一点：

my @extracted = map extract_data($_), 
       grep msg_rcpt_to($rcpt, $_), @log_data;

当你完成，你必须提取的数据在它出现在日志中的顺序相同的数组。

来源

2010-02-03 20:00:18 daotoad

如何在我处理后从Perl数组中删除元素？

回答

相关问题