gpt4 book ai didi

javascript - 如何从字符串中删除所有 Wiki 模板?

转载 作者:行者123 更新时间:2023-11-30 12:07:43 24 4
gpt4 key购买 nike

我有维基百科文章的内容有这样的东西:

{{Use mdy dates|date=June 2014}}
{{Infobox person
| name = Richard Matthew Stallman
| image = Richard Stallman - Fête de l'Humanité 2014 - 010.jpg
| caption = Richard Stallman, 2014
| birth_date = {{Birth date and age|1953|03|16}}
| birth_place = New York City
| nationality = American
| other_names = RMS, rms
| known_for = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC
| alma_mater = Harvard University,<br />Massachusetts Institute of Technology
| occupation = President of the Free Software Foundation
| website = {{URL|https://www.stallman.org/}}
| awards = MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards''
}}

{{Citation needed|date=May 2011}}

如何删除它?我可以使用这个正则表达式:/\{\{[^}]+\}\}/g 但它不适用于像 Infobox 这样的嵌套模板

我尝试使用此代码首先删除嵌套模板,然后删除信息框,但我得到了错误的结果。

var input = document.getElementById('input');
input.innerHTML = input.innerHTML.replace(/\{\{[^}]+\}\}/g, '');
<pre id="input">    {{Use mdy dates|date=June 2014}}
{{Infobox person
| name = Richard Matthew Stallman
| image =Richard Stallman - Fête de l'Humanité 2014 - 010.jpg
| caption = Richard Stallman, 2014
| birth_date = {{Birth date and age|1953|03|16}}
| birth_place = New York City
| nationality = American
| other_names = RMS, rms
| known_for = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC
| alma_mater = Harvard University,<br />Massachusetts Institute of Technology
| occupation = President of the Free Software Foundation
| website = {{URL|https://www.stallman.org/}}
| awards = MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards''
}}</pre>

最佳答案

Javascript 正则表达式没有匹配嵌套括号的功能(如递归或平衡组)。正则表达式的一种方法是使用一种模式多次处理字符串,该模式可以找到最里面的括号,直到没有任何东西可以替换为止:

do {
var cnt=0;
txt = txt.replace(/{{[^{}]*(?:{(?!{)[^{}]*|}(?!})[^{}]*)*}}/g, function (_) {
cnt++; return '';
});
} while (cnt);

图案细节:

{{
[^{}]* # all that is not a bracket
(?: # this group is only useful if you need to allow single brackets
{(?!{)[^{}]* # an opening bracket not followed by an other opening bracket
| # OR
}(?!})[^{}]* # same thing for closing brackets
)*
}}

如果不想对字符串进行多次处理,也可以在找到括号时通过字符增减一个标志来读取字符串字符。

另一种使用 split 和 Array.prototype.reduce 的方法:

var stk = 0;
var result = txt.split(/({{|}})/).reduce(function(c, v) {
if (v == '{{') { stk++; return c; }
if (v == '}}') { stk = stk ? stk-1 : 0; return c; }
return stk ? c : c + v;
});

关于javascript - 如何从字符串中删除所有 Wiki 模板?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34709748/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com