gpt4 book ai didi

java - 如何使用java将文本插入到扫描的pdf文档中

转载 作者:行者123 更新时间:2023-12-01 15:33:42 27 4
gpt4 key购买 nike

我必须将文本添加到有许多扫描的 pdf 文档的 pdf 文档中,以便将插入的文本插入回扫描的图像中,而不是插入到图像上。如何在 pdf 内的扫描图像上添加文本。

package editExistingPDF;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import jxl.Cell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;

import org.apache.commons.io.FilenameUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;

public class AddPragraphToPdf {



public static void main(String[] args) throws IOException, DocumentException, BiffException {

String tan = "no tan";
File inputWorkbook = new File("lars.xls");
Workbook w;

w = Workbook.getWorkbook(inputWorkbook);
// Get the first sheet
Sheet sheet = w.getSheet(0);

Cell[] tnas =sheet.getColumn(0);


File ArticleFolder = new File("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\article");
File[] listOfArticles = ArticleFolder.listFiles();

for (int ArticleInList = 0; ArticleInList < listOfArticles.length; ArticleInList++)
{
Document document = new Document(PageSize.A4);

// System.out.println(listOfArticles[ArticleInList].toString());
PdfReader pdfArticle = new PdfReader(listOfArticles[ArticleInList].toString());
if(listOfArticles[ArticleInList].getName().contains(".si."))
{continue;}
int noPgs=pdfArticle.getNumberOfPages();
String ArticleNoWithOutExt = FilenameUtils.removeExtension(listOfArticles[ArticleInList].getName());
String TanNo=ArticleNoWithOutExt.substring(0,ArticleNoWithOutExt.indexOf('.'));

// Create output PDF
PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\toPrint\\"+ArticleNoWithOutExt+".pdf"));
document.open();
PdfContentByte cb = writer.getDirectContent();
//get tan form excel sheet
System.out.println(TanNo);
for(Cell content : tnas){
if(content.getContents().contains(TanNo)){
tan=content.getContents();
System.out.println(tan);
}else{
continue;
}
}
// Load existing PDF
//PdfReader reader = new PdfReader(new FileInputStream("1.pdf"));

for (int i = 1; i <= noPgs; i++) {
PdfImportedPage page = writer.getImportedPage(pdfArticle, i);

// Copy first page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
// Add your TAN here
Paragraph p= new Paragraph(tan);
Font font = new Font();
font.setSize(1.0f);
p.setLeading(12.0f, 1.0f);
p.setFont(font);

document.add(p);
}
document.close();
}
}

}

注意:问题是,当创建仅包含文本的 pdf 时,我没有问题,但是当 pdf 充满扫描文档并且当我尝试添加文本时;它会添加到扫描文档的背面。因此,当我打印这些 pdf 时,我不会收到我添加的文本。

最佳答案

来自this iText Example (这与您想要的相反,但是将 getUnderContentgetOverContent 切换,就可以了):

Blockquote Each PDF page has two extra layers; one that sits on top of all text / graphics and one that goes to the bottom. All user added content gets in-between these two. If we get into this bottommost content, we can write anything under that we want. To get into this bottommost layer, we can use the " getUnderContent" method of PdfStamper object.
This is documented in iText API Reference as shown below:

public PdfContentByte getUnderContent(int pageNum)
Gets a PdfContentByte to write under the page of the original document.
Parameters:
pageNum - the page number where the extra content is written
Returns:
a PdfContentByte to write under the page of the original document

关于java - 如何使用java将文本插入到扫描的pdf文档中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9226061/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com