gpt4 book ai didi

正则表达式匹配具有特定属性的 html 标签

转载 作者:行者123 更新时间:2023-12-02 05:44:19 25 4
gpt4 key购买 nike

我正在尝试匹配所有没有属性“term”或“range”的 HTML 标签

这是示例 HTML 格式

<span class="inline prewrap strong">DATE:</span>    12/01/10
<span class="inline prewrap strong">MR:</span> 1234567
<span class="inline prewrap strong">DOB:</span> 12/01/65
<span class="inline prewrap strong">HISTORY OF PRESENT ILLNESS:</span> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum

<span class="inline prewrap strong">MEDICATIONS:</span> <span term="Advil" range="true">Advil </span>and Ibuprofen.

我的正则表达式是: <(.*?)((?!\bterm\b).)>
不幸的是,这匹配了所有标签……如果内部文本不匹配,那就太好了,因为我需要过滤掉除具有该特定属性的标签之外的所有标签。

最佳答案

如果正则表达式是你的东西,这对我有用。
(注意 - 不包括过滤掉评论、文档类型和其他实体。
其他警告;标签可以嵌入到脚本、评论和其他东西中。)

跨度标签 ( 带属性 ) 无术语|范围属性

'<span
(?=\s)
(?! (?:[^>"\']|(?>".*?"|\'.*?\'))*? (?<=\s) (?:term|range) \s*= )
\s+ (?:".*?"|\'.*?\'|[^>]*?)+
>'

任何 标签 ( 带属性 ) 无术语|范围属性
'<[A-Za-z_:][\w:.-]*
(?=\s)
(?! (?:[^>"\']|(?>".*?"|\'.*?\'))*? (?<=\s) (?:term|range) \s*= )
\s+ (?:".*?"|\'.*?\'|[^>]*?)+
>'

任何 标签 ( 无属性 ) 无术语|范围属性
'<
(?:
[A-Za-z_:][\w:.-]*
(?=\s)
(?! (?:[^>"\']|(?>".*?"|\'.*?\'))*? (?<=\s) (?:term|range) \s*= )
\s+ (?:".*?"|\'.*?\'|[^>]*?)+
|
/?[A-Za-z_:][\w:.-]*\s*/?
)
>'

更新

替代使用 (?>) 构造
下面的正则表达式是针对 no-'term|range'-attributes
标志 = (g)global 和 (s)dotall

带属性的 span 标签
链接: http://regexr.com?2vrjr
正则表达式: <span(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)(?:term|range)\s*=)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+>
任何带有属性的标签
链接: http://regexr.com?2vrju
正则表达式: <[A-Za-z_:][\w:.-]*(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)(?:term|range)\s*=)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+>
任何标签 w/attr 或 wo/attr
链接: http://regexr.com?2vrk1
正则表达式: <(?:[A-Za-z_:][\w:.-]*(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)(?:term|range)\s*=)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+|/?[A-Za-z_:][\w:.-]*\s*/?)>
'匹配除具有 term="occasionally"的标签之外的所有标签'

链接: http://regexr.com?2vrka <(?:[A-Za-z_:][\w:.-]*(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)term\s*=\s*(["'])\s*occasionally\s*\1)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+|/?[A-Za-z_:][\w:.-]*\s*/?)>

关于正则表达式匹配具有特定属性的 html 标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9008430/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com