2017-03-09 39 views
0

需要帮助。我一直在寻找一整天没有找到具体到我需要的解决方案。根据字符串查找并删除文件中的行但保留最后一次出现

在一个文件中:

Lots 
of 
other 
lines 
... 
... 
# [email protected] ..........1323 <- Do not include '# Client=HOSTNAME' 
# [email protected] ..........123123 <- Do not include '# Client=HOSTNAME' 
[email protected] ....rndChars.... <- delete line 
[email protected] ....rndChars.... <- delete line 
[email protected] ....rndChars.... <- delete line 
[email protected] ....rndChars.... <- delete line 
[email protected] ....rndChars.... <- keep last occurrence 
[email protected] ....rndChars.... <- keep last occurrence 
[email protected] ....rndChars.... <- delete line 
[email protected] ....rndChars.... <- delete line 
[email protected] ....rndChars.... <- keep last occurrence 
... 
... 
more 
lines 

我要找到匹配的所有行“客户端=”以上,删除该行除了最后occurrance。问题是我永远不知道主机名是什么。

输出应该是:

Lots 
of 
other 
lines 
... 
... 
# [email protected] ..........1323 <- Do not include '# Client=HOSTNAME' 
# [email protected] ..........123123 <- Do not include '# Client=HOSTNAME' 
[email protected] ....rndChars.... <- keep last occurrence 
[email protected] ....rndChars.... <- keep last occurrence 
[email protected] ....rndChars.... <- keep last occurrence 
... 
... 
more 
lines 

提前THX。

+0

你尝试过这么远吗? –

回答

0

Perl来拯救。读取文件两次,将每个主机的最后一行数保存在散列表中。

#!/usr/bin/perl 
use warnings; 
use strict; 

my $client_re = qr/Client=(.*?)@/; 

my $filename = shift; 

open my $IN, '<', $filename or die $!; 

my %lines; 
while(<$IN>) { 
    next if /^#/; 

    # Overwrite the line number if already present. 
    $lines{$1} = $. if /$client_re/; 
} 

seek $IN, 0, 0; # Rewind the file handle. 
$. = 0;   # Restart the line counter. 
while (<$IN>) { 
    if (! /^#/ && (my ($hostname) = /$client_re/)) { 
     print if $lines{$hostname} == $.; # Only print the stored line. 
    } else { 
     print; 
    } 
} 
0

使用tac & awk

tac file | awk '/^Client/{ if(!a[$1]){a[$1]++;print};next}1' | tac 

输出:

$ tac file | awk '/^Client/{ if(!a[$1]){a[$1]++;print};next}1' | tac 
Lots 
of 
other 
lines 
... 
... 
# [email protected] ..........1323 <- Do not include '# Client=HOSTNAME' 
# [email protected] ..........123123 <- Do not include '# Client=HOSTNAME' 
[email protected] ....rndChars.... <- keep last occurrence 
[email protected] ....rndChars.... <- keep last occurrence 
[email protected] ....rndChars.... <- keep last occurrence 
... 
... 
more 
lines 
0
sed -r ':a;N;$!ba;:b;s/(.*)(Client=[^@]+\b)[^\n]+\n*(.*\2)/\1\3/;tb' file 
相关问题