gpt4 book ai didi

python - stat : path should be string, 字节,os.PathLike 或整数,不是 NoneType - refextract

转载 作者:行者123 更新时间:2023-12-05 05:01:46 27 4
gpt4 key购买 nike

在我的 Python 项目中,我尝试使用 refextract从 pdf 文件解析一些数据,但我无法使用它的 extract_references_from_file 功能。

我正在使用网站上提到的示例代码:

from refextract import extract_references_from_file
references = extract_references_from_file('C02-1025.pdf')
print(references[0])

出现这个错误;

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

我已经尝试过不同的方式来像这样传递文件路径;

references = extract_references_from_file(r"F:\project\python\C02-1025.pdf")

references = extract_references_from_file("F:\\project\\python\\C02-1025.pdf")

但没有任何效果。

我使用的是 Python 3.7.2,它是 64 位的。

这是错误的完整回溯:

Traceback (most recent call last):
File "refext.py", line 16, in <module>
references = extract_references_from_file(r"F:\project\python\C02-1025.pdf")
File "C:\Users\Username\AppData\Local\Programs\Python\Python37\lib\site-packages\refextract\references\api.py", line 128, in extract_references_from_file
docbody = get_plaintext_document_body(path)
File "C:\Users\Username\AppData\Local\Programs\Python\Python37\lib\site-packages\refextract\references\engine.py", line 1412, in get_plaintext_document_body
textbody = convert_PDF_to_plaintext(fpath, keep_layout)
File "C:\Users\Username\AppData\Local\Programs\Python\Python37\lib\site-packages\refextract\documents\pdf.py", line 457, in convert_PDF_to_plaintext
if not os.path.isfile(CFG_PATH_PDFTOTEXT):
File "C:\Users\Username\AppData\Local\Programs\Python\Python37\lib\genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

refextract 库依赖于 pdftotext命令行实用程序。但是当我尝试安装它时

pip install pdftotext

它给了我这个错误

 ERROR: Command errored out with exit status 1:
command: 'c:\users\usernamem\appdata\local\programs\python\python37\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\USER~1\\AppData\\Local\\Temp\\pip-install-l_9a5zt6\\pdftotext\\setup.py'"'"'; __file__='"'"'C:\\Users\\USER~1\\AppData\\Local\\Temp\\pip-install-l_9a5zt6\\pdftotext\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\USER~1\AppData\Local\Temp\pip-record-gpha3woc\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\username\appdata\local\programs\python\python37\Include\pdftotext'
cwd: C:\Users\USER~1\AppData\Local\Temp\pip-install-l_9a5zt6\pdftotext\
Complete output (11 lines):
WARNING: pkg-config not found--guessing at poppler version.
If the build fails, install pkg-config and try again.
running install
running build
running build_ext
building 'pdftotext' extension
creating build
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -DPOPPLER_CPP_AT_LEAST_0_30_0=1 "-Ic:\users\username\appdata\local\programs\python\python37\include" "-Ic:\users\username\appdata\local\programs\python\python37\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /EHsc /Tppdftotext.cpp /Fobuild\temp.win-amd64-3.7\Release\pdftotext.obj -Wall
error: command 'cl.exe' failed: No such file or directory
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\username\appdata\local\programs\python\python37\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\USER~1\\AppData\\Local\\Temp\\pip-install-l_9a5zt6\\pdftotext\\setup.py'"'"'; __file__='"'"'C:\\Users\\USER~1\\AppData\\Local\\Temp\\pip-install-l_9a5zt6\\pdftotext\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\USER~1\AppData\Local\Temp\pip-record-gpha3woc\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\username\appdata\local\programs\python\python37\Include\pdftotext' Check the logs for full command output.

最佳答案

您正在使用的 refextract 库依赖于 pdftotext 命令行实用程序。目前在您的系统上找不到该程序,这会导致您描述的错误。 可能错误如此含糊不清。有一些代码试图提供更好的错误消息,但在这种情况下不起作用。

在 Linux 上,pdftotext 通常由您的发行版提供。在 Windows 上,您通常需要自己安装它。它来自 Xpdf tools package .您需要将可执行文件安装在系统的 PATH 中的某处,或者您需要通过设置环境变量 CFG_PATH_PDFTOTEXTrefextract 指向程序的位置.

关于python - stat : path should be string, 字节,os.PathLike 或整数,不是 NoneType - refextract,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62603602/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com