这里是我的解决办法awk
,我认为比sed
更灵活。 此prg。离开LaTeX命令(当单词以“\”开始时)并且将保留第一个大写字母。 LaTeX命令(和普通文本)的参数将被字典文件替代。 当[rev]程序的第三个参数打开时,它将通过相同的字典文件进行反向替换。 任何非alpha-beta字符都可作为字词分隔符(这在LaTeX源文件中是必需的)。 prg将其输出写入屏幕(stdout),因此您需要使用重定向到文件(> output_f)。 (我认为你的LaTeX源的inputencoding是1字节/字符。)
> cat dic.sh
#!/bin/bash
(($#<2))&& { echo "Usage $0 dictionary_file latex_file [rev]"; exit 1; }
((d= $#==3 ? 0:1))
awk -v d=$d '
BEGIN {cm=fx=0; fn="";}
fn!=FILENAME {fx++; fn=FILENAME;}
fx==1 {if(!NF)next; if(d)a[$1]=$2; else a[$2]=$1; next;} #read dict or rev dict file into an associative array
fx==2 { for(i=1; i<=length($0); i++)
{c=substr($0,i,1); #read characters from a given line of LaTeX source
if(cm){printf("%s",c); if(c~"[^A-Za-z0-9\\\]")cm=0;} #LaTeX command is occurred
else if(c~"[A-Za-z]")w=w c; else{pr(); printf("%s",c); if(c=="\\")cm=1;} #collect alpha-bets or handle them
}
pr(); printf("\n"); #handle collected last word in the line
}
function pr( s){ # print collected word or its substitution by dictionary and recreates first letter case
if(!length(w))return;
s=tolower(w);
if(!(s in a))printf("%s",w);
else printf("%s", s==w ? a[s] : toupper(substr(a[s],1,1)) substr(a[s],2));
w="";}
' $1 $2
字典文件:
> cat dictionary
apple lemon
raspberry cherry
pear banana
LaTeX的输入源:
> cat src.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].
Raspberry12Apple,pear.
执行结果:
> ./dic.sh
Usage ./dic.sh dictionary_file latex_file [rev]
> ./dic.sh dictionary src.txt >out1.txt; cat out1.txt
Lemon123banana,lemon "banana".
\Apple123pear{cherry}{banana}[lemon].
Cherry12Lemon,banana.
> ./dic.sh dictionary out1.txt >out2.txt rev; cat out2.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].
Raspberry12Apple,pear.
> diff src.txt out2.txt # they are identical
“替换” 不会为你做的工作? –
该脚本是微不足道的。数据,但是...你能提供一个合适的替代列表吗? –
那么,我可以使用'sed'或'awk'来分别替换每个案例。我希望有人已经为普通情况准备了一个循环或脚本。事实上,找到一个通用替代品列表是另一个挑战。如果要自己做,我会在github上设置它,以便在遇到新病例时进行更新。 –