gpt4 book ai didi

html - 需要有关查找 div 内容的好方法的建议

转载 作者:搜寻专家 更新时间:2023-10-31 22:15:30 25 4
gpt4 key购买 nike

<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
<div class="timespan">
<div class="openTime">
<div class="days">Mon,Tue,Wed,Thu,Sat</div>
<span class="hours"> 10:00 AM–6:00 PM</span>
</div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Fri</div>
<span class="hours"> 10:00 AM–9:00 PM</span></div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Sun</div>
<span class="hours"> 10:00 AM–5:00 PM</span>
</div>
</div>
</div>
</div>

我正在 try catch 所有 <div class="days"> 中的内容和 <span class="hours"> .我想我可以在此任务中使用正则表达式。但我也想学习任何有趣或专业的方法来捕捉像这样的特定 div block 。谢谢。

最佳答案

除了其他地方提到的HTML解析库,其他模块也有DOM能力。参见示例 Web::Query和 Mojolicious' Mojo::DOM .

这是一个使用 Mojo::DOM 和 CSS3 选择器的例子:

#!/usr/bin/env perl

use strict;
use warnings;

use 5.10.0;
use Mojo::DOM;

my $dom = Mojo::DOM->new(<<'HTML');
<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
<div class="timespan">
<div class="openTime">
<div class="days">Mon,Tue,Wed,Thu,Sat</div>
<span class="hours"> 10:00 AM–6:00 PM</span>
</div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Fri</div>
<span class="hours"> 10:00 AM–9:00 PM</span></div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Sun</div>
<span class="hours"> 10:00 AM–5:00 PM</span>
</div>
</div>
</div>
</div>
HTML

say "div days:";
say $_->text for $dom->find('div.days')->each;

say "\nspan hours:";
say $_->text for $dom->find('span.hours')->each;

或等效地:

say "div days:";
say for $dom->find('div.days')->map(sub{$_->text})->each;

say "\nspan hours:";
say for $dom->find('span.hours')->map(sub{$_->text})->each;

输出:

div days:
Mon,Tue,Wed,Thu,Sat
Fri
Sun

span hours:
10:00 AM–6:00 PM
10:00 AM–9:00 PM
10:00 AM–5:00 PM

或者要获取对应于天数的时间,您可以使用 openTimes div 的子元素:

say "Open Times:";
say for $dom->find('div.openTime')
->map(sub{$_->children->each})
->map(sub{$_->text})
->each;

输出:

Open Times:
Mon,Tue,Wed,Thu,Sat
10:00 AM–6:00 PM
Fri
10:00 AM–9:00 PM
Sun
10:00 AM–5:00 PM

编辑:Daxim 已经发布了类似的 Web::Query 代码作为评论,因此我将在此处重新发布它以获得更好的格式。我没试过,但我总体上相信他的代码。假设 HTML 在变量 $html 中:

use Web::Query qw(); 
my $w = Web::Query->new_from_html($html);
say "div days:";
say for $w->find('div.days')->text;
say "\nspan hours:";
say for $w->find('span.hours')->text;
say "Open Times:";
$w->find('div.openTime')->each(sub { say for $_->find('*')->text });

关于html - 需要有关查找 div 内容的好方法的建议,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10674308/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com