PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')(PySpark Tabula-Py Read_PDF(错误：没有名为‘org.apache.Commons’的模块))-6ren

PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')(PySpark Tabula-Py Read_PDF(错误：没有名为‘org.apache.Commons’的模块))

转载作者：bug小助手更新时间：2023-10-24 23:00:23

30

4

I've been runnning a pipeline in Azure for 4 months and it suddenly broke last night. I have the following code:

我已经在Azure运行了4个月的管道，昨天晚上突然坏了。我有以下代码：

!pip install tabula-py
from tabula.io import read_pdf
import tabula
df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0]

I got this error all of a sudden now:

我突然发现了这个错误：

~/cluster-env/env/lib/python3.8/site-packages/tabula/io.py in __init__(self, java_options, silent)
     92 
     93         from java import lang
---> 94         from org.apache.commons import cli
     95         from technology import tabula
     96 

ModuleNotFoundError: No module named 'org.apache.commons'

Any help would be appreciated.

任何帮助都将不胜感激。

更多回答

优秀答案推荐

the same happened to me today in a databricks environment after tabula was running smoothly for 6 months. My hotfix was to pip install the version 2.7.0 as I suppose the error is evoked by the most current version 2.8.1 which was published today.

在tabula顺利运行6个月后，今天在数据库环境中也发生了同样的事情。我的修复程序是pip安装版本2.7.0，因为我认为错误是由今天发布的最新版本2.8.1引起的。

tabula-py author is here.

爱吃白纸的作者来了。

I released v2.8.2 adding fallback to subprocess if jpype has importing issue.
https://pypi.org/project/tabula-py/2.8.2/

我发布了2.8.2版，如果jpype有导入问题，则将回退添加到子进程。Https://pypi.org/project/tabula-py/2.8.2/

Installing version 2.7.0 with the command pip install tabula-py==2.7.0 worked for me as well.

使用命令pip install tabula-py==2.7.0安装2.7.0版对我来说也是有效的。

更多回答

It didn't work for me for some reason. I thought it would do the trick. !pip install tabula-py==2.7.0. I'm still getting the same error.

出于某种原因，它对我不起作用。我以为这会奏效的。！pip安装Tabula-py==2.7.0。我仍然收到相同的错误。

It ended up working for me, I just had to create a new notebook and copy my code over. It must have been some cached version issue with the old notebook.Thanks!!

它最终对我起作用了，我只需要创建一个新的笔记本，然后复制我的代码。一定是旧笔记本的缓存版本有问题。谢谢！！

This answer is being discussed on Meta.

这个答案正在Meta上讨论。

30

4

0

Python:我尝试使用 tabula:ModuleNotFoundError:没有名为 'tabula' 的模块
我尝试使用 python 模块“tabula”，但显然我已经安装失败了。我只是使用了代码 import tabula 但是，我收到以下错误消息: ModuleNotFoundError: No mod
python - Tabula-py - 导入错误 : No module named tabula
我正在尝试使用 Tabula-py 来阅读 pdf。我通过 pip install tabula-py 安装了 tabula-py 我还安装了所需的依赖项 requests pandas pytest
PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')(PySpark Tabula-Py Read_PDF(错误：没有名为‘org.apache.Commons’的模块))
我已经在Azure运行了4个月的管道，昨天晚上突然坏了。我有以下代码：。我突然发现了这个错误：。任何帮助都将不胜感激。
python - Tabula 按区域坐标提取表格
我们可以选择通过指定坐标从 PDF 文档中提取表格。对于 Windows 用户，为了获取坐标，您必须将 PDF 文件上传到 Tabula 网页并导出包含坐标的脚本，然后将坐标输入到您的代码中。对于 M
java - 如何在 tabula 命令行中指定列坐标
我想要 PDF 中的表格数据，我正在使用以下命令获取表格数据 java -jar tabula-java.jar -a 301.95,14.85,841.0500000000001,695.25 -t
python - tabula-py 的奇怪行为
我正在使用 Python 3.5 和 Anaconda 发行版。 tabula-py 版本 1.1.1 已安装。当我运行以下简单程序时: import tabula df = tabula.read_
java - tabula-py Java版本错误
我安装了 python 模块 tabula-py，它显然是基于 Java 版本的 tabula。当我尝试运行它时，我收到一条错误消息，指出安装了错误版本的 Java，但当我在 MacOS 上检查系统首
python - Python tabula 模块中的这个错误是什么？
我一直收到这个错误。我正在研究 - 苹果塞拉利昂 10.8 python 3.6.2 表格 1.0.5 Traceback (most recent call last): File "/User
python - Tabula-py 字体未实现错误
PDF文件内容是中文(文字，不是图片等)，所以可能会使用不同的字体。我的代码: >>> import tabula >>> df = tabula.read_pdf('/data/proj/smart
python - Tabula-py - 页面参数
tabula.convert_into(filename_final, (filename_zero + '.csv'), output_format="csv
Python tabula read_pdf 打开java控制台窗口
我有一个使用 tabula.read_pdf 的脚本。脚本工作正常，但是当我使用 PyInstaller (使用 --noconsole 选项)构建 exe 文件并运行我的脚本时 - 它会打开 jav
python - 如何阻止 Tabula 自动删除空列？
我正在尝试从 PDF 中抓取数据，以便可以重新格式化它，然后将其插入到 Oracle 中的表中。我正在尝试使用 Tabula 读取 PDF 并将其转换为表列表，但如果这些列仅包含空值，Tabula 似
python - Tabula-py 没有正确拆分列
我刚刚发现了 tabula-py(当然还有 tabula-java)从 pdf 中提取表格的乐趣。我现在正在为我的工作编写一个脚本，该脚本从 pdf 表中读取一些数据，对其进行一些清理，然后将其导出到
python - 如何在 tabula-py 中设置页面范围？
在 Python 3 中，我有一个包含 6,041 页的 PDF 文件“Ativos_Fevereiro_2018_servidores.pdf”。我在装有 Ubuntu 的机器上。文件在这里:htt
Python3 : module 'tabula' has no attribute 'read_pdf'
一个 .py程序可以运行，但完全相同的代码，当作为 API 公开时，不起作用。该代码使用 Tabula 读取 pdf 并提供表格内容作为输出。我试过了 : import tabula df = t
python - 当Python tabula-py表格中有换行符时，如何读取pdf中的表格？
我尝试使用Python包tabula-py来读取table在pdf中，pdf表格单元格中的换行符似乎会将原始单元格中的内容分成多个单元格。我尝试搜索各种python包来解决这个问题。看来 tabul
python - 如何在 Tabula Java 中指定从哪个目录获取文件
我在 python 中有这段代码，我用它打开子进程模块并继续从那里获取数据，但我不知道如何对来自不同目录的文件进行 OCR。我试过将完整的文件路径放到文件名应该在代码中的目录中，但它似乎没有用。如何在
python - 提取PDF表格，Python3，tabula-py
这个问题在这里已经有了答案: How can I extract tables from PDF documents? (4 个答案) 关闭 8 天前。尝试使用 Python 3.6 从 PDF
python - Tabula-py 省略了我试图提取的 PDF 文档中的页面
我正在尝试使用 tabula-py 从多页 PDF 中提取表格，虽然 PDF 的某些页面上的表格被完美提取，但一些页面被完全省略。遗漏似乎是随机的，不遵循 PDF 上的任何可见视觉特征(因为每个页面
python-3.x - 流模式还是点阵模式，tabula-py模块中默认设置了哪个？
我想知道是否有熟悉 Python 的 tabula-py 模块的人可以帮助我解决这个问题。在任何 tabula-py 文档中都不清楚 tabula.read_pdf() 是否如果没有lattice 或

首页

博学

6Ren·AI

商城

PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')(PySpark Tabula-Py Read_PDF(错误：没有名为‘org.apache.Commons’的模块))