I've been runnning a pipeline in Azure for 4 months and it suddenly broke last night. I have the following code:
我已经在Azure运行了4个月的管道,昨天晚上突然坏了。我有以下代码:
!pip install tabula-py
from tabula.io import read_pdf
import tabula
df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0]
I got this error all of a sudden now:
我突然发现了这个错误:
~/cluster-env/env/lib/python3.8/site-packages/tabula/io.py in __init__(self, java_options, silent)
92
93 from java import lang
---> 94 from org.apache.commons import cli
95 from technology import tabula
96
ModuleNotFoundError: No module named 'org.apache.commons'
Any help would be appreciated.
任何帮助都将不胜感激。
更多回答
优秀答案推荐
the same happened to me today in a databricks environment after tabula was running smoothly for 6 months. My hotfix was to pip install the version 2.7.0 as I suppose the error is evoked by the most current version 2.8.1 which was published today.
在tabula顺利运行6个月后,今天在数据库环境中也发生了同样的事情。我的修复程序是pip安装版本2.7.0,因为我认为错误是由今天发布的最新版本2.8.1引起的。
tabula-py author is here.
爱吃白纸的作者来了。
I released v2.8.2 adding fallback to subprocess if jpype has importing issue.
https://pypi.org/project/tabula-py/2.8.2/
我发布了2.8.2版,如果jpype有导入问题,则将回退添加到子进程。Https://pypi.org/project/tabula-py/2.8.2/
Installing version 2.7.0 with the command pip install tabula-py==2.7.0 worked for me as well.
使用命令pip install tabula-py==2.7.0安装2.7.0版对我来说也是有效的。
更多回答
It didn't work for me for some reason. I thought it would do the trick. !pip install tabula-py==2.7.0. I'm still getting the same error.
出于某种原因,它对我不起作用。我以为这会奏效的。!pip安装Tabula-py==2.7.0。我仍然收到相同的错误。
It ended up working for me, I just had to create a new notebook and copy my code over. It must have been some cached version issue with the old notebook.Thanks!!
它最终对我起作用了,我只需要创建一个新的笔记本,然后复制我的代码。一定是旧笔记本的缓存版本有问题。谢谢!!
This answer is being discussed on Meta.
这个答案正在Meta上讨论。
我是一名优秀的程序员,十分优秀!