gpt4 book ai didi

perl - 在 Perl 中使用 MyParser 从 HTML 标签获取内容

转载 作者:行者123 更新时间:2023-12-04 20:36:23 25 4
gpt4 key购买 nike

我有一个 html 如下:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body bgcolor="white">

<h1>foo.c</h1>

<form method="post" action=""
enctype="application/x-www-form-urlencoded">
Compare this file to the similar file:
<select name="file2">

<option value="...">...</option>


</select>
<input type="hidden" name="file1" value="foo.c" /><br>
Show the results in this format:
</form>
<hr>

<p>
<pre>
some code
</pre>

我需要获取 input name = 'file' 的值和 HTML pre 标签的内容。我不知道 perl 语言,通过谷歌搜索我写了这个小程序(我认为它不是“优雅的”):

#!/usr/bin/perl

package MyParser;
use base qw(HTML::Parser);

#Store the file name and contents obtaind from HTML Tags
my($filename, $file_contents);

#This value is set at start() calls
#and use in text() routine..
my($g_tagname, $g_attr);


#Process tag itself and its attributes
sub start {
my ($self, $tagname, $attr, $attrseq, $origtext) = @_;

$g_tagname = $tagname;
$g_attr = $attr;
}

#Process HTML tag body
sub text {
my ($self, $text) = @_;

#Gets the filename
if($g_tagname eq "input" and $g_attr->{'name'} eq "file1") {
$filename = $attr->{'value'};
}

#Gets the filecontents
if($g_tagname eq "pre") {
$file_contents = $text;
}
}

package main;

#read $filename file contents and returns
#note: it works only for text/plain files.
sub read_file {
my($filename) = @_;
open FILE, $filename or die $!;
my ($buf, $data, $n);
while((read FILE, $data, 256) != 0) {
$buf .= $data;
}
return ($buf);
}


my $curr_filename = $ARGV[0];
my $curr_file_contents = read_file($curr_filename);

my $parser = MyParser->new;
$parser->parse($curr_file_contents);

print "filename: ",$filename,"file contents: ",$file_contents;

然后我调用 ./foo.pl html.html 但是我从 $filename$file_contents 变量中得到空值。

如何解决这个问题?

最佳答案

与往常一样,有不止一种方法可以做到这一点。下面是如何使用 DOM ParserMojolicious对于这个任务:

#!/usr/bin/env perl

use strict;
use warnings;
use Mojo::DOM;

# slurp all lines at once into the DOM parser
my $dom = Mojo::DOM->new(do { local $/; <> });

print $dom->at('input[name=file1]')->attr('value');
print $dom->at('pre')->text;

输出:

foo.c
some code

关于perl - 在 Perl 中使用 MyParser 从 HTML 标签获取内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13438026/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com