gpt4 book ai didi

c# - 从 PDF 的特定页面中提取图像

转载 作者:太空狗 更新时间:2023-10-30 00:15:53 24 4
gpt4 key购买 nike

我想从 PDF 文件中提取图像。我尝试使用以下代码,它从 PDF 中完美地提取了一个 jpeg 图像。问题是如何从特定页面中提取图像,例如第 1 页或来自其他页面。我不想阅读整个 PDF 来搜索图像。

有什么建议吗?

提取图片的代码:

private void List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
{
List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
iTextSharp.text.pdf.PdfObject PDFObj = null;
iTextSharp.text.pdf.PdfStream PDFStremObj = null;

try
{
RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
{
PDFObj = PDFReaderObj.GetPdfObject(i);

if ((PDFObj != null) && PDFObj.IsStream())
{
PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
{
byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

if ((bytes != null))
{
try
{
System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);

MS.Position = 0;
System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);
pictureBox1.Image = ImgPDF;
MS.Close();
MS.Flush();

}
catch (Exception)
{

}
}
}
}
}
PDFReaderObj.Close();
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}

最佳答案

我目前没有可用的 iTextSharp 4.0,因此此代码针对 5.2,但它也应该适用于旧版本。这段代码几乎是直接电梯from this post here ,因此请参阅该帖子以及对其他问题的回复。正如我在上面的评论中所说,您的代码正在从文档的角度查看所有图像,而我链接到的代码是逐页查看。

请阅读其他帖子中的所有评论,尤其是this one这解释了这 ONLY 适用于 JPG 图像。 PDF 支持许多不同类型的图像,因此除非您知道您只处理 JPG,否则您将需要添加更多代码。参见 this postthis post一些提示。

        string testFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Doc1.pdf");
string outputPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
int pageNum = 1;

PdfReader pdf = new PdfReader(testFile);
PdfDictionary pg = pdf.GetPageN(pageNum);
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj == null) { return; }
foreach (PdfName name in xobj.Keys) {
PdfObject obj = xobj.Get(name);
if (!obj.IsIndirect()) { continue; }
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (!type.Equals(PdfName.IMAGE)) { continue; }
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
if (bytes == null) { continue; }
using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes)) {
memStream.Position = 0;
System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
if (!Directory.Exists(outputPath))
Directory.CreateDirectory(outputPath);

string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", pageNum));
System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
var jpegEncoder = ImageCodecInfo.GetImageEncoders().ToList().Find(x => x.FormatID == ImageFormat.Jpeg.Guid);
img.Save(path, jpegEncoder, parms);

}
}

关于c# - 从 PDF 的特定页面中提取图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10689382/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com