计算记录的数字替换重复值

一个作业的服务器上运行后，它会创建一个文件象下面这样：计算记录的数字替换重复值

1000727888004 
522101 John Smith 
522101 John Smith 
522188 Shelly King 
522188 Shelly King 
1000727888002 
522990 John Doe 
522990 John Doe 
9000006000000

目前，我们在这个过程中，以修复代码，但将需要一个月。同时，我正在使用一条命令删除下面的重复记录。

perl -ne 'print unless $dup{$_}++;' old_file.txt > new_file.txt

我运行上面的命令后，它消除了重复的条目，但数仍为下同：

1000727888004 
522101 John Smith 
522188 Shelly King 
1000727888002 
522990 John Doe 
9000006000000

开始与1排最后一个数字的总数（SO 4应该是2在第一行中，2应该在第四行中为1，而6应该在以9开头的最后一行中为3）。它应该看起来像这样：

1000727888002 
522101 John Smith 
522188 Shelly King 
1000727888001 
522990 John Doe 
9000003000000

我不能想出任何可以修复它的逻辑。我需要帮助。我可以运行另一个命令或在我的perl命令中添加一些内容以更正计数。是的，我可以在Notepad ++中打开文件并手动修复数字，但我试图使其自动化。

谢谢！

来源

2017-04-22 Amir

那是什么最后的记录，从9？ –

这是总计数的文件的预告片。第一个9总是存在，然后接下来的6个数字是计数..如果它在一个数字中，则5个零填充在左边。最后6个数字总是0 – Amir

在awk中。它处理计数记录之间的“块”内的模糊，即。它不考虑整个文件中的重复内容。如果这是不正确的假设，让我知道。

$ awk ' 
NF==1 {   # for the cout record 
    if(c!="") # this fixes leading empty row 
     print c # print count 
    for(i in a) # all deduped data records 
     print i # print them 
    delete a  # empty hash 
    c=$0   # store count (well, you could use just the first count record) 
    next   # for this record don't process further 
} 
{ 
    if($0 in a) # if current record is already in a 
     c--  # decrease count 
    else a[$0] # else hash it 
} 
END {   # last record handling 
    print c  # print the last record 
    for(i in a) # just in case last record would be missing 
     print i # this and above could be removes 
}' file

输出：

1000727888002 
522101 John Smith 
522188 Shelly King 
1000727888001 
522990 John Doe 
9000006000000

如果受骗者在整个文件中删除，并最后一个记录是数也：

awk ' 
NF==1 { 
    if(NR==1) 
     c=$0 
    print c 
} 
NF>1 { 
    if($0 in a) 
     c-- 
    else { 
     a[$0] 
     print 
    } 
}' file 
1000727888004 
522101 John Smith 
522188 Shelly King 
1000727888002 
522990 John Doe 
1000727888001

来源

2017-04-23 06:26:22

计算记录的数字替换重复值

回答

相关问题