gpt4 book ai didi

linux - 尝试在 Linux 上使用 Perl 读取 pdf、解析数据并将所需数据写入电子表格

转载 作者:IT王子 更新时间:2023-10-29 01:26:34 27 4
gpt4 key购买 nike

我正尝试从信用卡对帐单中提取数据并将其输入电子表格以用于税务目的。到目前为止,我所做的涉及多个步骤,但我对 Perl 还比较陌生,并且是根据我所知道的进行工作。到目前为止,这是我编写的两个单独的脚本……一个从 pdf 中读取所有数据并写入一个文本文件,另一个解析文本(不完美)并将其写入另一个文本文件。然后我想创建一个 csv 文件以导入电子表格或直接写入电子表格。我想在一个脚本中执行此操作,但两三个就足够了。

第一个脚本:

#!/usr/bin/perl
use CAM::PDF;
my $file = "/home/cd/Documents/Jan14.pdf";
my $pdf = CAM::PDF->new($file);
my $doc="";
my $filename = 'report.txt';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
for ($i=1; $i <= $pdf->numPages(); $i++) {
$doc = $doc.$pdf->getPageText($i);
}
print $fh " $doc\n";
close $fh;
print "done\n";

第二个脚本:

#!/usr/bin/perl
use strict;
use warnings;

undef $/; # Enable 'slurp' mode
open (FILE, '<', 'report.txt') or die "Could not open report.txt: $!";

my $file = <FILE>; # Whole file here now...
my ($stuff_that_interests_me) =
($file =~ m/.*?(Date of Transaction.*?CONTINUED).*/s);
print "$stuff_that_interests_me\n";

my $filename = 'data.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";

print $fh " $stuff_that_interests_me\n";
close $fh;
print "done\n";

close (FILE) or die "Could not close report.txt: $!";

open (FILE2, '<', 'report.txt') or die "Could not open report.txt: $!";

my $file2 = <FILE2>; # Whole file here now...
my ($other_stuff_that_interests_me) =
($file2 =~ m/.*?(Page 2 .*?TRANSACTIONS THIS CYCLE).*/s);
print "$other_stuff_that_interests_me\n";
$filename = 'data.txt';
open($fh, '>>', $filename) or die "Could not open file '$filename' $!";

print $fh " $other_stuff_that_interests_me\n";
close $fh;
print "done\n";

close (FILE2) or die "Could not close report.txt: $!";

更新:我在 CPAN 上找到了一个模块 (CAM:PDF),它非常适合我正在尝试做的事情……它甚至以一种我可以更轻松地用于我的电子表格的格式呈现数据。但是,我还没有想出如何将它打印到 .txt 文件...有什么建议吗?

#!/usr/bin/perl -w

package main;

use warnings;
use strict;
use CAM::PDF;
use Getopt::Long;
use Pod::Usage;
use English qw(-no_match_vars);

our $VERSION = '1.60';

my %opts = (
density => undef,
xdensity => undef,
ydensity => undef,
check => 0,
renderer => 'CAM::PDF::Renderer::Dump',
verbose => 0,
help => 0,
version => 0,
);

Getopt::Long::Configure('bundling');
GetOptions('r|renderer=s' => \$opts{renderer},
'd|density=f' => \$opts{density},
'x|xdensity=f' => \$opts{xdensity},
'y|ydensity=f' => \$opts{ydensity},
'c|check' => \$opts{check},
'v|verbose' => \$opts{verbose},
'h|help' => \$opts{help},
'V|version' => \$opts{version},
) or pod2usage(1);
if ($opts{help})
{
pod2usage(-exitstatus => 0, -verbose => 2);
}
if ($opts{version})
{
print "CAM::PDF v$CAM::PDF::VERSION\n";
exit 0;
}

if (defined $opts{density})
{
$opts{xdensity} = $opts{ydensity} = $opts{density};
}
if (defined $opts{xdensity} || defined $opts{ydensity})
{
if (!eval "require $opts{renderer}") ## no critic (StringyEval)
{
die $EVAL_ERROR;
}
if (defined $opts{xdensity})
{
no strict 'refs'; ## no critic(ProhibitNoStrict)
my $varname = $opts{renderer}.'::xdensity';
${$varname} = $opts{xdensity};
}
if (defined $opts{ydensity})
{
no strict 'refs'; ## no critic(ProhibitNoStrict)
my $varname = $opts{renderer}.'::ydensity';
${$varname} = $opts{ydensity};
}
}

if (@ARGV < 1)
{
pod2usage(1);
}

my $file = shift;
my $pagelist = shift;

my $doc = CAM::PDF->new($file) || die "$CAM::PDF::errstr\n";

foreach my $p ($doc->rangeToArray(1, $doc->numPages(), $pagelist))
{
my $tree = $doc->getPageContentTree($p, $opts{verbose});
if ($opts{check})
{
print "Checking page $p\n";
if (!$tree->validate())
{
print " Failed\n";
}
}
$tree->render($opts{renderer});
}

最佳答案

I'd like to either create a csv file to import into a spreadsheet or write directly to a spreadsheet.

您可以直接写入电子表格,查看Excel::Writer::XLSX .

如果您想创建一个 CSV 文件,那么您可以尝试使用 Text::CSVText::CSV_XS .

关于linux - 尝试在 Linux 上使用 Perl 读取 pdf、解析数据并将所需数据写入电子表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27682297/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com