gpt4 book ai didi

javascript - node.js 中的本地 PDF 文件抓取

转载 作者:行者123 更新时间:2023-11-29 21:44:30 26 4
gpt4 key购买 nike

我已经使用 fs 通过 MEAN 堆栈 Web 应用程序上传了一个 pdf。我想从 pdf 中提取某些字段并将它们显示在网络应用程序上。我看过几个 npm 包,比如 pdf.js、pdf2json。我无法弄清楚可用示例中使用的文档和 javascript 回调。请帮忙!

最佳答案

我希望我能帮助回答你的问题。使用 pdf2json 可用于解析 pdf 并提取文本。需要采取几个步骤才能使其正常工作。我改编了 https://github.com/modesty/pdf2json 中的示例.

设置是在 Node 应用程序中安装 pdf2json,并且还有下划线。示例页面没有解释定义您自己的回调函数的必要性。它还使用 self 而不是 this 来注册它们。因此,通过适当的更改,从 pdf 中提取所有文本的代码将如下所示:

// Get the dependencies that have already been installed
// to ./node_modules with `npm install <dep>`in the root director
// of your app

var _ = require('underscore'),
PDFParser = require('pdf2json');

var pdfParser = new PDFParser();

// Create a function to handle the pdf once it has been parsed.
// In this case we cycle through all the pages and extraxt
// All the text blocks and print them to console.
// If you do `console.log(JSON.stringify(pdf))` you will
// see how the parsed pdf is composed. Drill down into it
// to find the data you are looking for.
var _onPDFBinDataReady = function (pdf) {
console.log('Loaded pdf:\n');
for (var i in pdf.data.Pages) {
var page = pdf.data.Pages[i];
for (var j in page.Texts) {
var text = page.Texts[j];
console.log(text.R[0].T);
}
}
};

// Create an error handling function
var _onPDFBinDataError = function (error) {
console.log(error);
};

// Use underscore to bind the data ready function to the pdfParser
// so that when the data ready event is emitted your function will
// be called. As opposed to the example, I have used `this` instead
// of `self` since self had no meaning in this context
pdfParser.on('pdfParser_dataReady', _.bind(_onPDFBinDataReady, this));

// Register error handling function
pdfParser.on('pdfParser_dataError', _.bind(_onPDFBinDataError, this));

// Construct the file path of the pdf
var pdfFilePath = 'test3.pdf';

// Load the pdf. When it is loaded your data ready function will be called.
pdfParser.loadPDF(pdfFilePath);

关于javascript - node.js 中的本地 PDF 文件抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31688728/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com