2010-07-10 101 views
2

这是与另外一个问题/代码高尔夫我问上Code golf: "Color highlighting" of repeated textshell脚本查找,搜索和

我有具有以下内容的文件“sample1.txt”文件替换字符串数组:

LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook. 

我有一个脚本生成的字符串的下面阵列(只有少数为了说明示出),其发生在文件中:

LoremIpsum 
LoremIpsu 
dummytext 
oremIpsum 
LoremIps 
dummytex 
industry 
oremIpsu 
remIpsum 
ummytext 
LoremIp 
dummyte 
emIpsum 
industr 
mmytext 

我需要(从顶部)看到如果'loremIpsum'出现在文件sample1.txt中。如果是这样,我想用以下代码替换所有的LoremIpsum:。现在,当程序移动到下一个单词'LoremIpsu'时,它不应该与sample1.txt中的文本匹配。它应该重复上面这个'数组'的所有元素。下一个'有效'的将是'dummytext',并且应该被标记为<T2>dummytext</T2>

我认为应该可以为此创建一个bash shell脚本解决方案,而不是依靠perl/python/ruby​​程序。

+0

这听起来像SED工作,但问题是我不明白。 – Marco 2010-07-10 07:18:52

+0

嗨马可 - T2的例子有帮助吗? – RubiCon10 2010-07-10 07:20:23

+1

你为什么要使用shell脚本?为什么不使用最适合工作的工具? Perl是用于低编程时文本处理的。 – Borealid 2010-07-10 07:27:02

回答

0

纯击(无外部对象)

在击命令行:

用Perl
$ sample="LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook." 
$ # or: sample=$(<sample1.txt) 
$ array=(
LoremIpsum 
LoremIpsu 
dummytext 
... 
) 
$ tag=0; for entry in ${array[@]}; do test="<[^>/]*>[^>]*$entry[^<]*</"; if [[ ! $sample =~ $test ]]; then ((tag++)); sample=${sample//${entry}/<T$tag>$entry</T$tag>}; fi; done; echo "Output:"; echo $sample 
Output: 
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>industry</T3>.<T1>LoremIpsum</T1>hasbeenthe<T3>industry</T3>'sstandard<T2>dummytext</T2>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook. 
+0

奇妙btw! – RubiCon10 2010-07-13 10:18:42

0

直接:

#! /usr/bin/perl 

use warnings; 
use strict; 

my @words = qw/ 
    LoremIpsum 
    LoremIpsu 
    dummytext 
    oremIpsum 
    LoremIps 
    dummytex 
    industry 
    oremIpsu 
    remIpsum 
    ummytext 
    LoremIp 
    dummyte 
    emIpsum 
    industr 
    mmytext 
/; 

my $to_replace = qr/@{[ join "|" => 
         sort { length $b <=> length $a } 
         @words 
        ]}/; 

my $i = 0; 
while (<>) { 
    s|($to_replace)|++$i; "<T$i>$1</T$i>"|eg; 
    print; 
} 

样品运行(包装,以防止水平滚动):

$ ./tag-words sample.txt 
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>indus 
try</T3>.<T4>LoremIpsum</T4>hasbeenthe<T5>industry</T5>'sstandard<T6>dummytext</T 
6>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatyp 
especimenbook.

您可能会反对,所有的qr//@{[ ... ]}业务都是巴洛克式的。人们可以得到与/o正则表达式开关相同的效果

# plain scalar rather than a compiled pattern 
my $to_replace = join "|" => 
       sort { length $b <=> length $a } 
       @words; 

my $i = 0; 
while (<>) { 
    # o at the end for "compile (o)nce" 
    s|($to_replace)|++$i; "<T$i>$1</T$i>"|ego; 
    print; 
} 
+0

嗨gbacon - 嗯 - 第二次更换应该是“T2”,第三 - “T3”...只是fyi - 我知道它的代码 – RubiCon10 2010-07-12 05:09:53

+0

@ RubiCon10 Ack!谢谢并修复! – 2010-07-12 12:22:26