pdf - 批量转换并将裁剪后记转换为pdf

转载作者：行者123 更新时间：2023-12-04 14:19:39

我几乎不了解在这个数字世界中生存的能力。

我有许多一页后记文件(图形/图像)，我希望将其转换为pdf并自动裁剪到一个狭窄的盒子中。我现在在Windows上(我也使用linux，所以请不要犹豫为linux发布代码)

过去，通过将Ghostscript gswin32c.exe和Calibre pdfmanipulate.exe结合在一起，我已经取得了成功。对于许多人来说，这可能是一种熟悉的方法。

但是，由于多种原因，这种方法充满了问题。

我“升级”到64位gswin64c.exe后出现了一个问题。 32位版本gswin32c.exe仍然可以在我的系统上使用，因此我不能提示太多。

处理可能未正确编码的后记文件时，出现了另一个问题。似乎至少有两个问题，但是我不确定哪个(如果有)是负责任的还是两者都有。一个问题是边界框线，例如
%% BoundingBox:135179484587
并不总是放在顶部的第二行。我了解这可能是个问题。另一个问题是，上面的边框对应于Ghostscript中的“纵向”方向，但是裁剪遵循“横向”方向。我还没有发现的另一个问题是，对于某些文件，裁剪似乎是非常随机的。

因此，这是我的32位方法(适用于高质量文件)，然后是64位的适应方法，该方法不起作用(也许是因为，如果我了解https://bugs.launchpad.net/ubuntu/+source/calibre/+bug/800551和http://www.mobileread.com/forums/archive/index.php/t-103097.html，但我只是在猜测，不知道任何解决方法):

@echo off echo batch processing with Latex ps2pdf followed by Ghostscript gswin64c.exe and Calibre2 pdfmanipulate.exe for %%I in (*.ps,*.eps) do ( "C:\Program Files\MiKTeX 2.9\miktex\bin\x64\ps2pdf" %%I ) for %%I in (*.pdf) do ( "C:\Program Files (x86)\Ghostscript\gs9.00\bin\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH
-sDEVICE#bbox "%%I" 2> bounding "C:\Program Files (x86)\Calibre2\pdfmanipulate.exe" crop -o "%%~nICropped32.pdf" -b bounding "%%I" pause "C:\Program Files\Ghostscript\gs9.04\bin\gswin64c.exe" -dSAFER -dNOPAUSE -dBATCH
-sDEVICE#bbox "%%I" 2> bounding "C:\Program Files (x86)\Calibre2\pdfmanipulate.exe" crop -o "%%~nICropped64.pdf" -b bounding "%%I" pause )

上面的32位方法适用于高质量文件，例如由PSTricks或Maple的标准2D绘图驱动程序生成的Postscript级别3，但不适用于较旧的文件，例如。 Maple的经典情节驱动程序产生的Postscript 2级(如果有)。

我发现了一些此类文件的解决方法。它包括使用(MiKTeX)LaTeX发行版中的epstopdf。它适用于那些Maple经典文件。不幸的是，它不适用于我几年前使用PSTricks和其他软件(如Matlab)生成的其他后记文件。

因此，我需要进行几次转换并选择可行的转换。我想知道您是否会提出使我的生活更轻松的建议。如果我可以解决BoundingBox和Portrait/Landscape问题，我应该很满意。

我预先感谢您的任何建议。 linux建议是可以接受的。我更喜欢一种解决方案，该解决方案可能只需按一下“返回”键就可以处理文件的多样性。

当然，我正在寻找一种无损裁剪，仅包含解释边界框，而不是将其转换为(可能是)质量较低的pdf的裁剪。

编辑:我忘了说。当我将gswin32c/pdfmanipulate应用于高质量的3级后记文件时，名为“bounding”的文件将填充如下信息:

%% BoundingBox:34128567667
%% HiResBoundingBox:34.364390 128.875004 566.054069 666.071980

在上面的示例中，该文件已经被裁剪。请注意%% BoundingBox和%% HiResBoundingBox之间的紧密度

但将其应用于质量较低的2级(或据称是自定义)后记文件后，“边界”文件将填充为:

%% BoundingBox:189137574467
%% HiResBoundingBox:189.485994 137.843996 573.299983 466.668478

但是边界框确实应该是
%% BoundingBox:135179484587
上面的(135 179 484 587)是postscript文件本身提供的边界框(我通过复制粘贴将其移至第二行)，并且与“纵向”中由Ghostview/Ghostscript解释的边界框一致。

但是它被Ghostscript完全忽略了...

我不知道189137574467是从哪里来的---这是非常错误的...

编辑2.针对Ken的问题，我想澄清几点:

嗨，肯，谢谢您的答复，

很抱歉，如果我的问题不清楚--但是您似乎已经理解了要点---让我依次回答您的问题:

I'm unsure why you are using 2 applications, it should be possible to perform the entire transformation with just Ghostscript.

我没有找到使用Ghostscript完成所有操作的方法，因此我使用了另一种方法。我在 http://www.mobileread.com/forums/archive/index.php/t-72885.html和其他地方找到了Ghostscript/Calibrate建议，并尝试了一下，直到最近才起作用。

我并不是说不可能用Ghostscript来完成所有的事情，我只是说我没有找到解决方法。

"One problem arose after I "upgraded" to the 64 bit gswin64c.exe" You haven't said what the problem was, have you reported it as a bug ? If people don't report bugs, they don't get fixed......

我在此处提供了描述问题和错误报告的链接: https://bugs.launchpad.net/ubuntu/+source/calibre/+bug/800551，
http://www.mobileread.com/forums/archive/index.php/t-103097.html，
我的问题是完全相同的。

You seem to have some confusion between PostScript programs and comments. Any line in a PostScript program beginning '%' is a comment, and has no effect on the operation of the program. So BoundingBox comments won't do anything at all.

如果可以的话，我希望有所不同。提取一个脚本文件，删除%% Bounding Box，保存并在Ghostview中打开它。 Ghostview会抛出错误消息，然后在不使用边界框信息的情况下显示错误消息，例如一个被很多空白包围的图形，而不是被边界框紧紧地包围着。因此，是的，至少在Ghostview中，此注释可以执行某些操作。删除了%% Bounding Box之后，如果您随后使用Calibre/pdfmanipulate裁剪pdf，则在具有%% Bounding Box可以正常工作的情况下，它将错误地裁剪它。因此，此“注释”在显示和裁剪的上下文中非常有用。

Note there is no requirement for it to be the second line of the file.....

这是Adobe推荐的。引用Adobe的话，

“第二个必需的DSC header 注释提供了有关
EPS文件的大小，并且必须存在，以便包含的应用程序可以
正确地转换和裁剪EPS文件。这是边界框注释。”

http://partners.adobe.com/public/developer/en/ps/5002.EPSF_Spec.pdf

Adobe说“必须”。就我个人而言，只要可以从适当范围内的eps生成pdf，我就不会在乎是否必须。

In general Ghostscript ignores DSC comments, however if you set ProcessDSC to true, then it will make very limited use of it (primarily the BoundingBox comment to set the page size).

使用pdfmanipulate可以使正确裁剪的pdf和裁剪不正确的pdf完全不同。

Moving on. You say you are using LaTeX ps2pdf, if you already have a PostScript file, you can send that to Ghostscript for conversion to PDF. Its not clear to me what exactly you are using Ghostscript for in this case, simply to find the real bounding box of the page ?

是的。

Its not clear to me what you mean by 'lossless' cropping, if you crop the content you must be losing something clearly, even if its just white space.....

我的意思是我不希望裁剪过程“光栅化”整个图像(或称其为“术语”)。裁剪掉的文件部分对我没有用，所以损失不大。裁剪中的文件部分应具有与原始文件相同的质量。那是一般的想法。

您可以在这里找到有关此内容的评论，这是我找到有用信息的地方，
http://www.charlietanksley.net/philtex/reading-pdfs-on-portables/

Its easy enough to do the conversion in one pass if you know the size you want to crop to,

不，我不知道大小，这就是为什么我要花这么长的时间来让软件为我计算，显然这不是一件简单的事情，因为Ghostscript和epstopdf并不总是就最佳裁剪达成共识，一个得到它适用于某些文件，但不适用于其他文件，另一种适用于其他文件，但不适用于某些文件...

if you don't know the size then you can do it in 2 passes using only Ghostscript by first extracting the BoundingBox as you have done. That will get you 4 numbers, the bottom left and top right of the bounding box (if I remember correctly). You then create a 'translate' PostScript operation to move the content of the page down and left (so that it starts at 0,0, the bottom left corner). You also create a page device request to set the page size, the size being given by width = right - left and height = top - bottom. Feed the original file, along with the PostScript operators, to Ghostscript and select the pdfwrite device and you will get a PDF file.

如果您有一个方便的话，批处理文件示例将是一个很好的选择。我已经看到了几个基于pdfwrite的示例，但我尝试过的都没有用。细节在于魔鬼。

As far as the bounding box goes, it may be a bug, or it may be that the file makes a mark, potentially using a white ink at the outside location. In this case the bounding box device will still regard it as part of the page content. You may be able to see that it isn't, but the device cannot. Consider if the page was first filled with a dark background, and the content outlined using white ink.

这些文件都是使用Matlab，Maple，PSTricks等软件创建的，在%% Bounding Box所指定的区域之外，不太可能(但显然并非不可能)出现看不见的白色标记。

在许多情况下，%% Bounding Box注释包含所有需要的信息，我希望使用Ghostscript或Calibre或pdfwrite或任何可以使用该信息的人。

I cannot offer a comprehensive solution without understanding more about what you want to do, and ideally seeing one or more of your problematic files.

那将是非常容易的，我如何发布一个postscript文件供您查看？这是420 KB。

感谢Ken，希望我们能找到一个可行的解决方案。

编辑3.我已经确定了问题的很大一部分。

我的postscript文件具有以下边界框，非常接近最佳裁剪:
%% BoundingBox:135179484587

当我运行Ghostscript gswin64c/gswin32c计算边界框时，即
for %%I in (*.ps,*.eps) do ("C:\Program Files\Ghostscript\gs9.04\bin\gswin64c.exe" -dSAFER -dNOPAUSE -dBATCH -dAutoRotatePages=/None -sDEVICE#bbox "%%I" 2> bounding)
我得到:

%%BoundingBox: 145 189 475 574 %%HiResBoundingBox: 145.331574 189.485994 474.155986 573.299983

当我运行ps2pdf后跟Ghostscript gswin64c时，即
for %%I in (*.ps,*.eps) do ("C:\Program Files\MiKTeX 2.9\miktex\bin\x64\ps2pdf" %%I)
for %%I in (*.pdf) do ("C:\Program Files\Ghostscript\gs9.04\bin\gswin64c.exe" -dSAFER -dNOPAUSE -dBATCH -dAutoRotatePages=/None -sDEVICE#bbox "%%I" 2> bounding)
我得到以下边界框:

%%BoundingBox: 189 137 574 467 %%HiResBoundingBox: 189.395994 137.843996 573.299983 466.668478

因此，问题在于，使用ps2pdf从ps转换为pdf会导致边界框信息发生变化，从而导致裁剪错误。因此，用诸如eps2pdf之类的其他东西替换ps2pdf可以解决这里的问题。当然，还有其他解决方案。正如Ken和luser droog所建议的那样，仅涉及Ghostcript的解决方案特别有值(value)。下面是他们非常有值(value)的建议(优于我的快速解决方案)。这样的事情已经奏效了:
for %%I in (*.eps,*.ps) do ("C:\Program Files\MiKTeX 2.9\miktex\bin\x64\epstopdf" %%I)
for %%I in (*.pdf) do (
"C:\Program Files\Ghostscript\gs9.04\bin\gswin64c.exe" -dSAFER -dNOPAUSE -dBATCH -dAutoRotatePages=/None -sDEVICE#bbox "%%I" 2> bounding
"C:\Program Files (x86)\Calibre2\pdfmanipulate.exe" crop -o "%%~nICropped.pdf" -b bounding "%%I"
)

最佳答案

评论中没有足够的空间来添加此内容，所以我恐怕还要发布另一个答案...。

BoundingBox对于PDF文件而言似乎是伪造的，原因是PDF转换过程的功能。默认情况下，它将旋转页面，直到文本的大部分变为水平为止(对于此文件(我认为其他文件也存在相同的问题))，这将导致顺时针旋转90度。

当然，这意味着边界框也会旋转，并且检查值表明这已经发生了。因此，BoundingBox对于旋转的PDF文件是正确的。

现在，我通过私有(private)电子邮件提供了几个PostScript程序，这是我的意思:

1pass.ps

这将从源PostScript文件中读取BoundingBox行，并使用它来设置页面大小和偏移量。您通过使用提供的文件设置'SourceFileName'例如，传递要使用的文件名:

gs -sDEVICE=pdfwrite -sSourceFileName=classic.ps -o out.pdf 1pass.ps

将产生一个名为out.pdf的文件，该文件是读取BoundingBox并将其转换为裁切成该大小的页面的PDF文件的结果。

%!PS  %% redefine setpagedevice to prevent changes by the PostScript program  %% But keep a copy under a different name, so we cna use it.  /Oldsetpagedevice /setpagedevice load def  /setpagedevice {pop} bind def  (File to process is ) print SourceFileName ==  /SourceFile SourceFileName (r) file def  /BoxString 65535 string def  /LLx 0 def  /LLy 0 def  /URx 0 def  /URy 0 def  /FoundBox false def  /GetValues {    token {                   % read a PostScript token      /LLx exch def               % Assume its a number for now      token {        /LLy exch def        token {          /URx exch def          token {            /URy exch def            pop                       % Get rid of any remaining string data            true              % return success code          }{            (Failed to read a number from the string) ==            false             % return failure code          } ifelse        }{          (Failed to read a number from the string) ==          false               % return failure code        } ifelse      }{        (Failed to read a number from the string) ==        false                 % return failure code      } ifelse    } {      (Failed to read a number from the string) ==      false                   % return failure code    } ifelse  } bind def  {    SourceFile BoxString readline {      (%%BoundingBox:) anchorsearch {        pop                           %% discard matching string        GetValues             %% extract BBox        /FoundBox exch def        %% Note success/failure        exit                  %% exit this loop      } {        pop                   %% discard string, no match      } ifelse    } {      (Failed to find a %%BoundingBox comment) ==      exit                            %% No more data, exit loop    } ifelse  } loop  SourceFile closefile            %% close the file  FoundBox {    (LLx = ) print LLx ==    (LLy = ) print LLy ==    (URx = ) print URx ==    (URy = ) print URy ==    > Oldsetpagedevice    LLx neg LLy neg translate    SourceFileName run  } if

2pass.ps

This is intended to be used the way you are currently working, it has two advantages over 1pass.ps:

It works with PDF files as well as PostScript files, and with files which do not contain a %%BoundingBox comment.
The BoundingBox is accurate.

It has the disadvantage that you have to process each file twice, once to get the bounding box and once to create the PDF file.

This takes two parameters, the name of the file containing the output of the bbox device, and the name of the file to be converted. Again, using the file you sent, you would use it like this:

First command:

  gs \
   -sDEVICE=bbox \
    classic.ps 2> bounding.txt

第二条命令:

  gs \
   -sDEVICE=pdfwrite \
   -sBoxFileName=bounding.txt \
   -sPostScriptFileName=classic.ps \
   -o out.pdf \
    2pass.ps

classic.ps的PostScript代码:

％!PS

%%重新定义setpagedevice以防止PostScript程序进行更改
%%但是请使用其他名称保留副本，因此我们可以使用它。
/Oldsetpagedevice/setpagedevice加载定义
/setpagedevice {pop}绑定(bind)def

(文件中的边界框参数)print BoxFileName ==
(要处理的文件是)print PostScriptFileName ==

/BoxFile BoxFileName(r)文件def
/BoxString 256字符串def
/HiResBoxString 256字符串定义
/LLx 0定义
/LLy 0定义
/URx 0定义
/URy 0定义

BoxFile BoxString readline％从文件读取第一行
{
/BoxString exch def％将字符串重新定义为我们阅读的字符串
} {
(在换行符读取%% BoundingBox之前遇到EOF)==刷新
} 如果别的

BoxFile HiResBoxString readline％从文件读取第一行
{
/HiResBoxString exch def％将字符串重新定义为我们阅读的字符串
} {
(在换行符读取%% HiResBoundingBox之前遇到EOF)==刷新
} 如果别的

BoxFile closefile％关闭文件

BoxString(%% BoundingBox :) anchorsearch
{
pop％摆脱数学字符串
token {％读取PostScript token
/LLx exch def％假设其数字
token {
/LLy exch def
token {
/URx EXCH def
token {
/URy exch def
pop％清除所有剩余的字符串数据
} {
(无法从字符串中读取数字)==
} 如果别的
} {
(无法从字符串中读取数字)==
} 如果别的
} {
(无法从字符串中读取数字)==
} 如果别的
} {
(无法从字符串中读取数字)==
} 如果别的
} {
打印(不包含BoundingBox)==
} 如果别的

(LLx =)打印LLx ==
(LLy =)打印LLy ==
(URx =)打印URx ==
(URy =)打印URy ==

>旧版设备
LLx neg LLy neg翻译

PostScriptFileName运行

关于pdf - 批量转换并将裁剪后记转换为pdf，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8711969/

文章推荐： vim 野生菜单 : move into subdirectory with a different key than

文章推荐： prolog - 键入 Y 组合器

文章推荐： joomla - 如何解决 joomla 中的 500 内部服务器错误？

java - JPA/Hibernate 批量(批量)插入
这是我在阅读了几个关于 jpa 批量插入的主题后创建的简单示例，我有 2 个持久对象用户和站点。一个用户可以有多个站点，所以我们在这里有一对多的关系。假设我想创建用户并将多个站点创建/链接到用户帐户。
azure - 如何在文档数据库中上传多个文档(批量)
我有文档列表(对象)，该对象有多个文档，即存在 Json 记录，但是当我尝试上传文档束(记录)时，它没有上传到文档数据库，但当我上传单个文档记录时，它上传成功。 List listObj = ne
perl - 如何检查域名是否可用(批量)？
我希望进行批量域名查找，看看是否有一些域名可供购买。我找不到 perl 模块，但似乎应该有一种方法可以在 perl 中执行此操作。我正在寻找免费的东西。谢谢! 最佳答案从这里:http://www.
axapta - 批量 FTPWebRequest
我制作了一个批处理类来检查 FTP 上的文件、下载它们并在 FTP 上删除它们。当我手动运行它(不是批量运行)时，它运行完美，下载 FTP 中的所有文件并在下载完成后删除它们。当我尝试批量运行时，
string - 批量 * 通配符替换
我有一个 *+* 形式的字符串 base。我想得到+之前的所有内容。例如，如果 base=foo+bar，我想获取 foo。我尝试过使用字符串替换来实现 set left=%base:+*=% 但这
mysql - 如何使MySQL默认引擎为innodb？(批量)
我需要创建几十个表，并且我需要它们是innodb，有没有办法做到这一点，而不是将 engine=innodb 附加到每个 create table 语句？最佳答案可以在服务器级别指定默认引擎，在
linux - 如何在windows中获取unix风格的提示符(批量)
我正在尝试制作显示 unix/linux 提示符的 dos shell。代码是: @echo off :hi set tmpdrv=%cd:~0,2% if %homedrive% == %tmpdr
matlab - 批量/并行进行一维卷积
我有以下代码，基本上是在二维矩阵的每一行上进行一维卷积。卷积核是一样的。所以真的是 SIMD 案例。 a = [ 1,2,3,4,5; 6,7,8,9,7; 7,6
windows - 如何通过文件夹循环移动文件夹(批量)？
情况: 我尝试在 shell 中的循环内移动文件，但我的代码无法正常工作。 for /D %%F in (*) do ( if "%%F" NEQ "%directoryToPutFilesIn
windows - 批量 |回显多个变量到文件
目录包含 2 个(或更多)任意名称的视频文件。 video1.mkv video2.mkv 需要找出每个视频的持续时间。为此，我们使用 MediaInfo . setlocal EnableDelay
windows - 如何从文件名中删除空格(批量)
如何在 Windows 中批量删除数千个文件中的空格(而不是替换为下划线)？我可以从 DOS 命令执行此操作吗？目前: file one.mp3 file two.mp3 所有文件需要变成: fil
windows - 批量 IF 变量比较神秘地不起作用
我想创建一个批处理文件，它读取 2 个不同的值，并根据它们的比较方式进行相应处理。但是，比较永远不会起作用。代码是: REM string1 and string2 contain the follo
windows - 批量 - 使用通配符将文件夹复制到多个文件夹
我正在尝试将一个文件夹的子文件夹复制到许多其他名称未知的文件夹中。目的是在所有使用它的员工文件夹中备份程序的源文件。如果在员工文件夹中找不到程序文件夹，则不应执行任何操作。这看起来如下: 来源: F:
python - 检测文本是否为英文(批量)
我正在寻找一种简单的方法来检测一小段文本(几句话)是否为英语。在我看来，这个问题比尝试检测任意语言要容易得多。有没有可以做到这一点的软件？我正在用 python 编写，并且更喜欢 python 库，但
android - 批量 Firebase 云消息传递的限制
我们正在尝试向 8k 种不同的设备发送促销推送消息。我们正在成功响应推送通知 URL https://fcm.googleapis.com/fcm/send 但只有部分用户收到此通知，并非全部。那么
batch-file - 批量，用延迟扩展参数替换延迟扩展字符串
基本上我只是用这一段来替换我的 var 中的一个字符串，但我无法让嵌套延迟扩展正常工作。这甚至可能吗？ set replace=!replace:!search!=!replaceVal!! 我知道执
bash - FFmpeg:一次编码多个视频文件(批量)
如何使用 ffmpeg 对一批视频文件进行编码，使用相同的设置？我找到了 one-line solution将当前文件夹中的 .avi 文件转换为 .mov。请注意，我要编码 .mov -> .mo
batch-file - "Counter"批量
我正在尝试制作一个批处理文件，每次循环时都会将变量增加 1，然后检查变量是否等于 5，如果不是，则再次循环。我知道这可能有一个 while 循环，但我不知道如何做到这一点，我现在只是享受学习 Batc
读取带有变量行的 CSV 以跳过、批量
我正在尝试创建一个循环，读取多个 CSV 文件，这些文件都具有相同类型的气温数据。但是，我想跳过数据上方的行。这些是数据集中的“警报”。每个文件可能有不同数量的警报，因此要跳过不同数量的行。见下文:
batch-file - 在文件中回显多个单词会使它崩溃。 (批量)
因此，我正在批量创建一个Mail程序，而消息传递部分出现了问题。消息传递部分是无限循环。当我输入多个单词时，它会崩溃。这是代码。请帮忙! :rep set line= set /p line=

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

pdf - 批量转换并将裁剪后记转换为pdf

2pass.ps

First command: