gpt4 book ai didi

html - Perl HTML::Strip->parse 不忽略

转载 作者:行者123 更新时间:2023-11-28 00:03:31 25 4
gpt4 key购买 nike

我目前正在使用 perl HTML::Strip 从我的 HTML 文件中提取文本,但是我遇到了 HTML 特定空间的一个小问题,即“”。出于某种原因,HTML::Strip->parse() 似乎在这种情况下不起作用。我知道我可以稍后运行替换命令。但是我正在检查是否有另一种方法可以通过调整 new() 构造函数来实现这一点?提前致谢

Perl 代码:

my $hs = HTML::Strip->new();
my $line = join('',@htmlSource);
my $clean_text = $hs->parse( $line );
push @processedLines, grep { /\S/ } split (/\n/,$clean_text);
foreach my $f ( @processedLines ) {
print "$f\n";
}

示例输出:

CBD_UnitTest
MtrTempEst
MtrTempEst_Init1 (C1-Coverage: 100.00 %, 1 out of 1 Testcases passed)
LeadLagFilt (C1-Coverage: 100.00 %, 1 out of 1 Testcases failed)
 
 
AssMechFiltInit (C1-Coverage: 100.00 %, 1 out of 1 Testcases passed)

示例数据集:

<table bgcolor="white" width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center">
<table width="100%" cellspacing="0" cellpadding="1" bgcolor="white" border="0">
<tr bgcolor="#dcdcdc">
<td width="1%" bgcolor="white">
<img border="0" src="pictures/batch_module_notok.jpg"/>
</td>
<td colspan="3" width="1%">
<font face="tahoma" size="-2" color="black">
CBD_UnitTest
</font>
</td>
<td width="1%">
</td>
<td width="1%">
</td>
<td width="1%">
<img border="0" src="pictures/batch_check_notok.gif"/>
</td>
</tr>
<tr bgcolor="white">
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
<img border="0" src="pictures/batch_module_notok.jpg"/>
</td>
<td colspan="2">
<font face="tahoma" size="-2" color="black">
MtrTempEst
</font>
</td>
<td width="1%">
</td>
<td width="1%">
</td>
<td width="1%">
<img border="0" src="pictures/batch_check_notok.gif"/>
</td>
</tr>
<tr bgcolor="#dcdcdc">
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
<img border="0" src="pictures/batch_ok.jpg"/>
</td>
<td>
<a href="#CBD_UnitTest:MtrTempEst:ts_MtrTempEst_Init1"><font face="tahoma" size="-2" color="black">
MtrTempEst_Init1 (C1-Coverage: 100.00 %, 1 out of 1 Testcases passed)
</font></a>
</td>
<td width="1%">
</td>
<td width="1%">
</td>
<td width="1%">
<img border="0" src="pictures/batch_check_ok.gif"/>
</td>
</tr>
<tr bgcolor="#FF0000">
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
<img border="0" src="pictures/batch_notok.jpg"/>
</td>
<td>
<a href="#CBD_UnitTest:MtrTempEst:ts_LeadLagFilt"><font face="tahoma" size="-2" color="white">
<b>LeadLagFilt (C1-Coverage: 100.00 %, 1 out of 1 Testcases failed)</b>
</font></a>
</td>
<td width="1%">
<a name="LeadLagFilt_0"></a>
&nbsp; </td>
<td width="1%">
&nbsp; </td>
<td width="1%">
<img border="0" src="pictures/batch_check_notok.gif"/>
</td>
</tr>
<tr bgcolor="#dcdcdc">
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
</td>
<td width="1%" bgcolor="white">
<img border="0" src="pictures/batch_ok.jpg"/>
</td>
<td>
<a href="#CBD_UnitTest:MtrTempEst:ts_AssMechFiltInit"><font face="tahoma" size="-2" color="black">
AssMechFiltInit (C1-Coverage: 100.00 %, 1 out of 1 Testcases passed)
</font></a>
</td>
<td width="1%">
</td>
<td width="1%">
</td>
<td width="1%">
<img border="0" src="pictures/batch_check_ok.gif"/>
</td>
</tr>
</table>
</td>
</tr>
</table>

最佳答案

你安装了吗HTML::Entities?docs for HTML::Strip状态:

“如果安装了 HTML::Entities,HTML::Strip 只会尝试解码 HTML 实体。”

关于html - Perl HTML::Strip->parse 不忽略,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19865538/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com