2016-01-10 30 views
1

我有这样的东西维基百科文章内容:如何从字符串中删除所有Wiki模板?

{{Use mdy dates|date=June 2014}} 
{{Infobox person 
| name  = Richard Matthew Stallman 
| image  = Richard Stallman - Fête de l'Humanité 2014 - 010.jpg 
| caption  = Richard Stallman, 2014 
| birth_date = {{Birth date and age|1953|03|16}} 
| birth_place = New York City 
| nationality = American 
| other_names = RMS, rms 
| known_for = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC 
| alma_mater = Harvard University,<br />Massachusetts Institute of Technology 
| occupation = President of the Free Software Foundation 
| website  = {{URL|https://www.stallman.org/}} 
| awards  = MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards'' 
}} 

{{Citation needed|date=May 2011}} 

如何去除呢?我可以使用这个正则表达式:/\{\{[^}]+\}\}/g,但它不会工作嵌套模板像Infobox

我试图使用此代码首先删除嵌套的模板,然后删除信息框,但我得到了错误的结果。

var input = document.getElementById('input'); 
 
input.innerHTML = input.innerHTML.replace(/\{\{[^}]+\}\}/g, '');
<pre id="input"> {{Use mdy dates|date=June 2014}} 
 
    {{Infobox person 
 
    | name  = Richard Matthew Stallman 
 
    | image  =Richard Stallman - Fête de l'Humanité 2014 - 010.jpg 
 
    | caption  = Richard Stallman, 2014 
 
    | birth_date = {{Birth date and age|1953|03|16}} 
 
    | birth_place = New York City 
 
    | nationality = American 
 
    | other_names = RMS, rms 
 
    | known_for = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC 
 
    | alma_mater = Harvard University,<br />Massachusetts Institute of Technology 
 
    | occupation = President of the Free Software Foundation 
 
    | website  = {{URL|https://www.stallman.org/}} 
 
    | awards  = MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards'' 
 
    }}</pre>

+1

@yurzui这不会对文本的工作,包含{{}}在一个以上的地方https://regex101.com/r/kG7bO0/2 – jcubic

+0

@jcubic你的意思是'foo'不应该匹配? – tchelidze

+0

如果你可以在两个步骤中完成,你可以匹配内部第一个然后外部,这对于内部https://regex101.com/r/pG5sS0/1 –

回答

3

的Javascript正则表达式不具备的功能(如递归或平衡组)来匹配嵌套的括号内。用正则表达式的一种方式包括处理字符串的模式数倍发现最里面的支架,直到有什么可以替代:

do { 
    var cnt=0; 
    txt = txt.replace(/{{[^{}]*(?:{(?!{)[^{}]*|}(?!})[^{}]*)*}}/g, function (_) { 
     cnt++; return ''; 
    }); 
} while (cnt); 

图案的详细资料:

{{ 
[^{}]* # all that is not a bracket 
(?: # this group is only useful if you need to allow single brackets 
    {(?!{)[^{}]* # an opening bracket not followed by an other opening bracket 
    | # OR 
    }(?!})[^{}]* # same thing for closing brackets 
)* 
}} 

如果你不想处理该字符串多次,您还可以逐字符地读取字符串增加和减少括号时发现一个标志。

采用分体式和Array.prototype.reduce的另一种方式:

var stk = 0; 
var result = txt.split(/({{|}})/).reduce(function(c, v) { 
    if (v == '{{') { stk++; return c; } 
    if (v == '}}') { stk = stk ? stk-1 : 0; return c; } 
    return stk ? c : c + v; 
}); 
相关问题