gpt4 book ai didi

php - Notepad++ 删除里面有特定文本的标签

转载 作者:数据小太阳 更新时间:2023-10-29 03:01:02 26 4
gpt4 key购买 nike

我有一个包含产品的大型 XML 文件。我正在尝试删除所有缺货的产品。文件大小超过 20MB。

<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>

<product>
<name>bla2</name>
<price>60$</price>
<stock>no</stock>
<description>bla</description>
</product>

...

是否可以使用 Notepad++ 的正则表达式删除它们,还是应该使用 simpleXML(PHP) 或类似的东西?

我的基本 PHP 代码:

$url = 'input/products.xml';
$xml = new SimpleXMLElement(file_get_contents($url));

foreach ($xml->product->children() as $product) {

//finding out of stock products and deleting them

}
$xml->asXml('output/products.xml');

最佳答案

前进

通过正则表达式进行模式匹配并不理想,如果您可以访问 PHP,那么我建议使用合适的 HTLM 解析工具。话虽如此,我提供了一个可以在 Notepad++ 中使用的解决方案

描述

<product\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s?\/?>(?:(?!</product).)*<stock\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s?\/?>no</stock>(?:(?!</product).)*<\/product>

替换为: 什么都没有

Regular expression visualization

为了更好地查看图像,您可以右键单击它并选择在新窗口中查看。

此正则表达式将执行以下操作:

  • 找到整个产品部分
  • 需要子标签 stock
  • 需要子标签 stock的值为 no
  • 避免使 HTML 中的模式匹配变得困难的极端情况

从 Notepad++

在 Notepad++ 中,请注意您应该使用 notpad++ 版本 6.1 或更高版本,因为旧版本中的正则表达式问题现已解决。

  1. ctrlh进入查找替换模式

  2. 选择正则表达式选项

  3. 在“查找内容”字段中放置正则表达式

  4. 在“替换为”字段中输入``

  5. 点击全部替换

例子

现场演示

https://regex101.com/r/cW9nC5/1

示例文本

<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>

<product>
<name>bla2</name>
<price>60$</price>
<stock>no</stock>
<description>bla</description>
</product>

替换后

<product>
<name>bla1</name>
<price>50$</price>
<stock>yes</stock>
<description>bla</description>
</product>

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
<product '<product'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
> '>\r\n'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</product '</product'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
<stock '<stock'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
>no</stock> '>no</stock>'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</product '</product'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
product> 'product>'
----------------------------------------------------------------------

关于php - Notepad++ 删除里面有特定文本的标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37528281/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com