java - 线程 "main"org.pdfclown.util.parsers.ParseException : 'name' table does NOT exist 中的异常-6ren

java - 线程 "main"org.pdfclown.util.parsers.ParseException : 'name' table does NOT exist 中的异常

转载作者：塔克拉玛干更新时间：2023-11-02 19:24:34

我正在尝试运行 Stefano Chizzolini(很棒的人:PDFClown 的创建者)编写的 Java 代码，以使用 PDF Clown 库解析 PDF。我收到此错误，但我不知道该如何解决。

Exception in thread "main" org.pdfclown.util.parsers.ParseException: 'name' table does NOT exist.
at org.pdfclown.documents.contents.fonts.OpenFontParser.getName(OpenFontParser.java:570)
at org.pdfclown.documents.contents.fonts.OpenFontParser.load(OpenFontParser.java:221)
at org.pdfclown.documents.contents.fonts.OpenFontParser.<init>(OpenFontParser.java:205)
at org.pdfclown.documents.contents.fonts.TrueTypeFont.loadEncoding(TrueTypeFont.java:91)
at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
at org.pdfclown.documents.contents.fonts.TrueTypeFont.<init>(TrueTypeFont.java:68)
at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:253)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:626)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
at PDFReader.FullExtract.run(FullExtract.java:71)
at PDFReader.FullExtract.main(FullExtract.java:142)

我知道库包中的类 OpenFontParser 会引发此错误。我能做些什么来解决这个问题吗？

此代码适用于大多数 PDF。我有一个无法解析的 PDF。我猜这是因为 this symbol在 pdf 下方。

public class PDFReader extends Sample {

@Override
public void run()
{
    String filePath = new String("C:\\Users\\XYZ\\Desktop\\SomeSamplePDF.pdf");

    // 1. Open the PDF file!
    File file;
    try
    {file = new File(filePath);}
    catch(Exception e)
    {throw new RuntimeException(filePath + " file access error.",e);}

    // 2. Get the PDF document!
    Document document = file.getDocument();

    // 3. Extracting text from the document pages...
    for(Page page : document.getPages())
    {
    extract(new ContentScanner(page)); // Wraps the page contents into a scanner.

    }
    close(file);
}

private void close(File file) {
    // TODO Auto-generated method stub

}

/**
Scans a content level looking for text.
 */
/*
NOTE: Page contents are represented by a sequence of content objects,
possibly nested into multiple levels.
 */
private void extract(
        ContentScanner level
        )
{
    if(level == null)
        return;

    while(level.moveNext())
    {
        ContentObject content = level.getCurrent();
        if(content instanceof ShowText)
        {
            Font font = level.getState().getFont();
            // Extract the current text chunk, decoding it!
            System.out.println(font.decode(((ShowText)content).getText()));
        }
        else if(content instanceof Text
                || content instanceof ContainerObject)
        {
            // Scan the inner level!
            extract(level.getChildLevel());
        }
    }
}

private boolean prompt(Page page)
{
    int pageIndex = page.getIndex();
    if(pageIndex > 0)
    {
        Map<String,String> options = new HashMap<String,String>();
        options.put("", "Scan next page");
        options.put("Q", "End scanning");
        if(!promptChoice(options).equals(""))
            return false;
    }

    System.out.println("\nScanning page " + (pageIndex+1) + "...\n");
    return true;
}

public static void main(String args[])
{
    new PDFReader().run();
    }

}

最佳答案

问题

如堆栈跟踪所示，问题在于 PDF 中嵌入的某些 TrueType 字体不包含 name 表，即使它是必需的表:

org.pdfclown.util.parsers.ParseException: 'name' table does NOT exist.
...
at org.pdfclown.documents.contents.fonts.TrueTypeFont.loadEncoding(TrueTypeFont.java:91)

因此，严格来说，嵌入的字体是无效的，因此嵌入的 PDF 也是如此。由于这个有效性问题，PDFClown 遇到了异常。

一些背景

A TrueType font file consists of a sequence of concatenated tables. ...

The first of the tables is the font directory, a special table that facilitates access to the other tables in the font. The directory is followed by a sequence of tables containing the font data. These tables can appear in any order. Certain tables are required for all fonts. Others are optional depending upon the functionality expected of a particular font.

Tables that are required must appear in any valid TrueType font file. The required tables and their tag names are shown in Table 2.

Table 2: The required tables
Tag     Table 
'cmap'  character to glyph mapping 
'glyf'  glyph data 
'head'  font header 
'hhea'  horizontal header 
'hmtx'  horizontal metrics 
'loca'  index to location 
'maxp'  maximum profile 
'name'  naming 
'post'  PostScript 
(Section TrueType Font files: an overview in chapter 6 The TrueType Font File in the TrueType Reference Manual)

另一方面，虽然有许多 PDF 生成器将嵌入式 TrueType 字体减少到 PDF 查看器(最重要的 Adobe Reader)所需的基本要素，并且 name 表似乎没有严格要求。

此外，表 name 在 PDFClown 中仅用于一个目的，即确定相关字体的名称，即使字体名称可以从 BaseFont 相关字体词典的条目也是如此。实际上后者是 PDF 规范必需的，而 name 表中的 PostScript 字体名称 是可选的 根据 TTF 手册。

因此，使用 PDF 字体字典中的 BaseFont 条目将是此 name 表访问的更好替代方案。

修复它

Is there anything I can do to fix this?

您可以通过向有问题的嵌入式 TTF 添加 name 表来修复不完全有效的 PDF，或者您可以修补 PDFClown 以忽略丢失的缺失表:在类 org.pdfclown.documents.contents.fonts.OpenFontParser 中编辑方法 getName:

private String getName(
  int id
  ) throws EOFException, UnsupportedEncodingException
{
  // Naming Table ('name' table).
  Integer tableOffset = tableOffsets.get("name");
  if(tableOffset == null)
    throw new ParseException("'name' table does NOT exist.");

将 throw new ParseException("'name' table does NOT exist.") 替换为 return null。

附言

虽然可以仅使用 OP 提供的信息来分析问题，但 sample file由 @akarshad 提供在他现已删除的答案中，他更有动力开始分析。

关于java - 线程 "main"org.pdfclown.util.parsers.ParseException : 'name' table does NOT exist 中的异常，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23544547/

文章推荐： Android - 滑动回收 View 项目后显示按钮

文章推荐： java - 升级到 ubuntu 14.04 后 Eclipse 上没有自动完成

文章推荐： java - 使用 twitter4j 从特定位置提取主题标签的推文

文章推荐： java - 区分用于货币的 BigDecimal 和用于百分比的 BigDecimal

c# - Google.Apis.Util.Utilities.GetStringValue(System.Enum) 和 Google.Apis.Util.Utilities.GetStringValue(System.Enum) 之间的调用不明确
我在这个网站上发布的代码有这个问题 https://developers.google.com/drive/quickstart-cs是 Google Drive 快速入门的开发人员站点。我按照网站上
java - Kafka Utils 类路径错误 : org. apache.kafka.common.utils.Utils
我正在尝试制作一个非常简单的 Kafka Producer，目前正在关注 producer example除了我的制作人没有分区程序类。将所需文件导出到 jar 后，我将它们传输到我的 Linux
java - 当使用外部模拟 util 库(也使用 java util lib)测试 java util 库时的循环引用
问题在java中，我有一个“Util项目”，在进行单元测试时使用另一个“Mock项目”。我的问题是“模拟项目”也使用“Util项目”来构建一些模拟对象。当我使用 Maven 构建项目时，我无法构
scala - 真的需要 scala.util.automata、scala.util.regexp 和 scala.util.grammar 吗？
据我所知，这些包已经存在很长时间了。但是，我从未见过它们的实际用法。而且这些包似乎不成熟，不再维护。如果是，为什么这些包现在存在？最佳答案包裹automata被 scala.xml.dtd 使用,
android - 无法下载 backport-util-concurrent.jar(backport-util-concurrent :backport-util-concurrent:3. 1):没有可用于离线模式的缓存版本
关闭。这个问题需要debugging details .它目前不接受答案。想改进这个问题？将问题更新为 on-topic对于堆栈溢出。 1年前关闭。 Improve this question Co
java - 为什么在 java.util.Collections 中声明静态 java.util.Collections.fill() 方法而不是在 java.util.AbstractList 中声明实例方法？
在java.util.Collections中，有一个方法: public static void fill(List list, T obj) 用第二个参数指定的对象填充第一个参数指定的List。
scala - 类型不匹配;找到 : edu. stanford.nlp.util.CoreMap => 需要单位 : java. util.function.Consumer[_> : edu. stanford.nlp.util.CoreMap]
我不明白它要我做什么。分配给 sentence正在工作: val sentences : java.util.List[CoreMap] = document.get(classOf[Sentence
javascript - util 函数直接导出 vs util 类
在我的 React 应用程序中，我想使用一些实用程序。我见过两种不同的方法。第一个是，只是创建函数并将其导出。第二个是，创建一个 Util 类并导出一个对象，这样它就不能被实例化(静态类)。 clas
java - 如何对依赖于其他 Util 类方法的 Util 类进行单元测试？
我有一个 util 类，它接受 String jwtToken 和 Key key 并使用 io.jsonwebtoken.jwts 解码 jwt。但是，我无法对此进行测试。原因是，我无法模拟公钥并
java - 目标命名空间java util cxf和代码生成包java.util.xsd
我有使用目标命名空间的专有架构 xmlns:ax216="http://util.java/xsd" 这给我带来了从 java (java.util.xsd) 开始生成禁止的(由 Java 安全管理器
java - java.util.Collections和java.util.Collection在Java中有什么关系吗？
我正在阅读集合以查看 Javadocs 中的实现层次结构。 Collections声明为public class Collections extendds Object Collection声明为pu
java - 无法将 'config.map' 下的属性绑定(bind)到 java.util.Map>> :
我正在使用 Spring-boot 应用程序，我可以在其中连接 Azure 应用程序配置。但是当我尝试使用内容类型应用程序/JSON 读取值时出现错误。我的Java类 @ConfigurationP
java - 无法将 'config.map' 下的属性绑定(bind)到 java.util.Map>> :
我正在使用 Spring-boot 应用程序，我可以在其中连接 Azure 应用程序配置。但是当我尝试使用内容类型应用程序/JSON 读取值时出现错误。我的Java类 @ConfigurationP
java.util.IllegalFormatConversionException 与 java.util.Formatter
我在使用格式说明符时遇到问题。这是否意味着我正在使用 %d？ public static void main(String[] args) { double y, x; for (x =
java.util.Iterator 但无法导入 java.util.Iterator
鉴于此代码 import java.util.Iterator; private static List someList = new ArrayList(); public static void
java.util.Scanner 处的 java.util.NoSuchElementException
我正在 HackerEarth 解决问题，我无法弄清楚为什么我的程序在命令行上正确运行并给出正确的结果，但在代码编辑器上运行时却给出 java.util.NoSuchElementException
java.util.ArrayList 无法转换为 java.util.Vector
我正在尝试使用以下代码使用对象列表列表中的数据填充tableModel readExcel.readSheet(0): TableModel tableModel = new DefaultTabl
java.util.Set、java.util.List 可序列化问题
java.util.Set 、 java.util.List 和其他 Collection 接口(interface)不可序列化。需要一个简单、直接的解决方案来在可序列化的 POJO 中使用它。 pu
java.util.Vector 无法转换为 java.util.ArrayList
我试图从 servlet 返回数据库搜索结果的 ArrayList 以显示在 jsp 页面上。在servlet中设置arraylist作为请求的属性，并将请求转发到jsp页面。当我尝试在 jsp 页
java.util.HashMap 无法转换为 java.util.ArrayList
我是android新手，最近我试图从firebase中提取数据到recyclerview/cardview中以垂直布局显示数据，它显示将Hashmap转换为Arraylist的错误，其中代码是:

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 线程 "main"org.pdfclown.util.parsers.ParseException : 'name' table does NOT exist 中的异常

问题

一些背景

修复它

附言