gpt4 book ai didi

html - 使用 Perl 正则表达式从 HTML 文件打印多行模式

转载 作者:行者123 更新时间:2023-11-28 17:32:44 25 4
gpt4 key购买 nike

我有一个 HTML 文件。这是一个示例

      <div class="criteria" style="padding-left:0;font-style:italic">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;You searched for: 
<span title="A*" >Individual: <span><b>A*</b></span></span>
</div>

</td>

</tr>

</table>

<table cellpadding="5" cellspacing="0" border="0" style="border-collapse: collapse; width: 100%">

<tr class="ListItemColorNew">

<td style="width:50%">
<div class="gvListItemStyle">
<span class="LargeText15">JAMES BOND A&#39;MONEYPENNY </span> (LIC# 1111111)
<div class="GrayTextShade"><i>Alternate Names: BOND JAMES</i></div>
<div class="GrayTextShade">
GREY TIDE LLC (LIC# 2222)
</div>
</div>
</td>

<td style="width:50%">
<div class="gvListItemStyle">
<span class="LargeText15">FRANK WHITE A&#39;SMALLS </span> (LIC# 1111111)
<div class="GrayTextShade"><i>Alternate Names: JAMES SMALLS</i></div>
<div class="GrayTextShade">
WEST RIVER CORP LLC (LIC# 3333)
</div>
</div>
</td>


<td style="width: 25%; vertical-align: top">
<div class="gvListItemStyle">
<div><img alt="help" src=\'/Content/images/BrokerCheck/icon-blueCheck.png\' style=\'vertical-align:top;padding-right:5px\' />Broker</div>
</div>
</td>

<td style="width:25%;text-align:right;vertical-align:top">
<div class="gvListItemStyle">
<a class="btn btn-primary" href="/Individual/Summary/5820616">Details &#187;</a> </div>
</td>

</tr>

我正在尝试提取 <td style="width:50%"> 之间的所有内容和 </td> .数据存储在文件中 testFile.txt .

这是我用的Perl代码

 system("perl -pi.bak -e '/^<td style=\"width:50%\">.+<\\/td>/mg' testFile.txt";

最佳答案

您的以下代码实际上没有做任何事情:

system("perl -pi.bak -e '/^<td style=\"width:50%\">.+<\\/td>/mg' testFile.txt");
  1. 您在没有捕获的空上下文中匹配 m//,因此执行的语句毫无意义。

  2. 您的模式永远不会与您的内容匹配,因为:

    一个。您正在使用任何字符 .,但它不会匹配换行符,除非您使用 /s Modifier .

    您正在使用 -p 逐行处理文件,但您的模式需要跨行才能匹配。

以下演示了正则表达式解决方案(不推荐)和使用实际的 HTML 解析器,在本例中为 Mojo::DOM .如需有用的 8 分钟介绍视频,请查看 Mojocast Episode 5

use strict;
use warnings;

use Mojo::DOM;

my $data = do { local $/; <DATA> };

# Regex Solution:
if ( $data =~ m{<td style="width:50%">(.*?)</td>}s ) {
print "Regex Solution:\n$1";
} else {
warn "No pattern match found";
}

# Parser Solution:
my $dom = Mojo::DOM->new($data);

my $yourtd = $dom->at(q{td[style="width:50%"]})->content;

print "\nMojo::DOM:\n", $yourtd;

__DATA__
<html>
<head>
<title>Hello World</title>
</head>
<body>
<table>
<tr>
</td>
<div class="criteria" style="padding-left:0;font-style:italic">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;You searched for:
<span title="A*" >Individual: <span><b>A*</b></span></span>
</div>

</td>
</tr>
</table>

<table cellpadding="5" cellspacing="0" border="0" style="border-collapse: collapse; width: 100%">

<tr class="ListItemColorNew">
<td style="width:50%">
<div class="gvListItemStyle">
<span class="LargeText15">JAMES BOND A&#39;MONEYPENNY </span> (LIC# 1111111)
<div class="GrayTextShade"><i>Alternate Names: BOND JAMES</i></div>

<div class="GrayTextShade">
GREY TIDE LLC (LIC# 2222)
</div>
</div>
</td>
<td style="width: 25%; vertical-align: top">
<div class="gvListItemStyle">
<div><img alt="help" src=\'/Content/images/BrokerCheck/icon-blueCheck.png\' style=\'vertical-align:top;padding-right:5px\' />Broker</div>
</div>
</td>
<td style="width:25%;text-align:right;vertical-align:top">
<div class="gvListItemStyle">
<a class="btn btn-primary" href="/Individual/Summary/5820616">Details &#187;</a> </div>
</td>
</tr>
<table>
</body>
</html>

输出:

Regex Solution:

<div class="gvListItemStyle">
<span class="LargeText15">JAMES BOND A&#39;MONEYPENNY </span> (LIC# 1111111)
<div class="GrayTextShade"><i>Alternate Names: BOND JAMES</i></div>

<div class="GrayTextShade">
GREY TIDE LLC (LIC# 2222)
</div>
</div>

Mojo::DOM:

<div class="gvListItemStyle">
<span class="LargeText15">JAMES BOND A&#39;MONEYPENNY </span> (LIC# 1111111)
<div class="GrayTextShade"><i>Alternate Names: BOND JAMES</i></div>

<div class="GrayTextShade">
GREY TIDE LLC (LIC# 2222)
</div>
</div>

关于html - 使用 Perl 正则表达式从 HTML 文件打印多行模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25711629/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com