gpt4 book ai didi

.net - 以编程方式搜索 PDF 文件中的文本并告知页码?

转载 作者:行者123 更新时间:2023-12-01 21:11:36 24 4
gpt4 key购买 nike

有一些工具可以提取 PDF 文件的整个文本部分,以便对 PDF 进行全文索引。

我需要的是一种搜索某些字符串的方法,如果在 PDF 文件中找到它们,则返回页码?

最佳答案

此示例使用 Adob​​e Reader 附带的库,来自 http://www.dotnetspider.com/resources/5040-Get-PDF-Page-Number.aspx :

using Acrobat;
using AFORMAUTLib;
private void pdfRandD(string fPath)
{
AcroPDDocClass objPages = new AcroPDDocClass();
objPages.Open(fPath);
long TotalPDFPages = objPages.GetNumPages();
objPages.Close();
AcroAVDocClass avDoc = new AcroAVDocClass();
avDoc.Open(fPath, "Title");
IAFormApp formApp = new AFormAppClass();
IFields myFields = (IFields)formApp.Fields;
string searchWord = "Search String";
string k = "";
StreamWriter sw = new
StreamWriter(@"D:\KCG_FileChecker_Inputs\MAC\pdf\0230_525490_23_cha17.txt", false);
for (int p = 0; p < TotalPDFPages; p++)
{
int numWords = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageNumWords(" + p + ");"));
k = "";
for (int i = 0; i < numWords; i++)
{
string chkWord = myFields.ExecuteThisJavascript("event.value=this.getPageNthWord(" + p + "," + i + ", true);");
k = k + " " + chkWord;
}
if(k.Trim().Contains(searchWord))
{
int pNum = int.Parse(myFields.ExecuteThisJavascript("event.value=this.getPageLabel(" + p + ",true);"));
sw.WriteLine("The Word " + searchWord + " is exists in " + pNum);
}

}
sw.Close();
MessageBox.Show("Process completed");
}

关于.net - 以编程方式搜索 PDF 文件中的文本并告知页码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/709606/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com