2013-04-21 41 views
1

我有一个AWK这样的脚本,我将在一个文件上运行:如何限制awk只搜索某个HTML标签中包含的项目?

cat input.txt | awk 'gsub(/[^ ]*(fish|shark|whale)[^ ]*/,"(&)")' >> output.txt 

这增加了括号包含文字“鱼”,“鲨鱼”的所有行,或“鲸鱼”,为如:

The whale asked the shark to swim elsewhere. 
The fish were unhappy. 

通过脚本运行它后,文件就变成了:

The (whale) asked the (shark) to swim elsewhere. 
The (fish) were unhappy. 

的文件标有HTML标签,我需要做的只是更换之间发生<b></b>标签。

The whale asked <b>the shark to swim</b> elsewhere. 
<b>The fish were</b> unhappy. 

这将成为:

The whale asked <b> the (shark) to swim </b> elsewhere. 
<b> The (fish) were </b> unhappy. 
  • 匹配大胆的标签永远放在不同的行。起始<b>标记总是与结尾</b>标记在同一行上。

我怎样才能限制awk的搜索只搜索和修改<b></b>标签之间的文本?

+0

上'匹配()'函数读向上及其相关AWK乏,RSTART和RLENGTH。祝你好运。 – shellter 2013-04-21 01:38:53

+0

一个UUOC奖等着你。 – 2013-04-21 02:13:27

回答

1

下面是使用awk的技术:

awk '/<b>/{f=1}/<\/b>/{f=0}f{gsub(/fish|shark|whale/,"(&)")}1' RS=' ' ORS=' ' file 
The whale asked <b>the (shark) to swim</b> elsewhere. 
<b>The (fish) were</b> unhappy. 
1

只要HTML标记是不是比这更糟糕,而<b> ... </b>跨度将不包含任何其他HTML标记,那么就比较容易在Perl:

$ cat data 
The whale asked <b>the shark to swim</b> elsewhere. 
<b>The fish were</b> unhappy. 
The <b> dogfish and the sharkfin soup</b> were unscathed. 
$ perl -pe 's/(<b>[^<]*)\b(fish|shark|whale)\b([^<]*<\/b>)/\1(\2)\3/g' data | so 
The whale asked <b>the (shark) to swim</b> elsewhere. 
<b>The (fish) were</b> unhappy. 
The <b> dogfish and the sharkfin soup</b> were unscathed. 
$ 

我试图适应,要awk(和gawk),并没有成功;比赛部分工作,但替换表达没有。阅读本手册,与Perl不同,您无法在圆括号中标识单独的匹配子表达式。

+0

另外,我注意到'如果鲨鱼和鲸鱼一起游泳',只有鲨鱼得到加括号。如果这是一个问题,你必须努力工作。如果有必要,可以完成 - 为读者练习! – 2013-04-21 03:11:11

相关问题