gpt4 book ai didi

javascript - 改进 JavaScript 正则表达式以匹配带有或不带有结束标记的标记内的内容,不包括 self

转载 作者:搜寻专家 更新时间:2023-11-01 05:09:27 24 4
gpt4 key购买 nike

Preface: I'm aware about general consensus standing against using regex to parse HTML. Asking you in advance, please avoid any recommendations in this regard.


说明。

我有以下正则表达式

/<div class="panel-body">([^]*?)(<\/div>|$)/gi

匹配div内的所有内容,包括self与类 .panel-body

完整匹配:

<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
</div>

.. 它也匹配没有结尾 div 的内容标签。

完整匹配:

<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
Don't match after closing `div`...but match this and below in case closing `div` is removed.
Line below 1
Line below 2
Line below 3

问题。

我如何改进我的正则表达式以执行以下操作:

  1. 不包含在完整匹配中<div class="panel-body">和关闭 </div> (当有关闭div标签时)

  2. 在不使用组的情况下直接(如果可能)进入完全匹配

regex101.com example


编辑 1:

字符串不以<div class="panel-body">开头, 它开始于

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webmin 1.851 on centos.centos (CentOS Linux 7.3.1611)</title>
</head>
<body>
<div>
<div>
<div class="panel-body">

* 注意:因为它是渐进式输出,所以在满载之前永远不会关闭。

编辑 2:

发布答案后,我进行了速度对比测试。这取决于您,谁的解决方案最适合您。

Speed-test Results

最佳答案

您可以使用 DOM 解析器,它也应该带有不完整的标签:

function divContent(str) {
// create a new dov container
var div = document.createElement('div');

// assign your HTML to div's innerHTML
div.innerHTML = '<html>' + str + '</html>';

// find an element by given className
var el = div.getElementsByClassName("panel-body");

// return found element's first innerHTML
return (el.length > 0 ? el[el.length-1].innerHTML : "");
}

// extract text from a complete tag:
var html = `<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
</div>`;
console.log(divContent(html));

// extract text from an incomplete tag:
html = `<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
Don't match after closing 'div'...but match this and below
in case closing 'div' is removed.
Line below 1
Line below 2
Line below 3`;
console.log(divContent(html));

// OP'e edited HTML text
html = `<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webmin 1.851 on centos.centos (CentOS Linux 7.3.1611)</title>
</head>
<body>
<div>
<div>
<div class="panel-body">`;
console.log(divContent(html));

JS Fiddle

关于javascript - 改进 JavaScript 正则表达式以匹配带有或不带有结束标记的标记内的内容,不包括 self,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45408900/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com