gpt4 book ai didi

python - docker python 提卡

转载 作者:行者123 更新时间:2023-12-02 18:59:51 25 4
gpt4 key购买 nike

我喜欢创建一个 Dockerfile 来安装所有必要的组件,以便在 Docker 容器中运行 python-tika。

到目前为止,这是我的 Dockerfile:

###Get python
FROM python:3

RUN pip3 install --upgrade pip requests
RUN pip3 install python-docx tika numpy pandas

RUN mkdir scripts

ADD runner.py /scripts/

CMD [ "python", "./scripts/runner.py" ]

我构建它并运行 Dockerfile:

docker build -t docker-tika .

docker run docker-tika

但它报错如下:

[~/Documents/BERT_DV/Docker_Parser] $ docker run docker-tika
2020-05-08 13:49:52,528 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to /tmp/tika-server.jar.
2020-05-08 13:50:09,742 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to /tmp/tika-server.jar.md5.
2020-05-08 13:50:10,133 [MainThread ] [ERROR] Unable to run java; is it installed?
2020-05-08 13:50:10,134 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.
2020-05-08 13:50:10,271 [MainThread ] [ERROR] Unable to run java; is it installed?
2020-05-08 13:50:10,271 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.

runner.py脚本如下:

import tika
tika.initVM()

我有以下两个问题:1. 我读到我们需要下载 tika-server jar2. 在后台启动 tika-server 的 python 脚本中调用 initVM()。

我不知道里面缺少什么。文件。感谢帮助!

I have update Docker file with Java as well and still it's complaining about Java

### 1. Get Linux
FROM alpine:3.7

### 2. Get Java via the package manager
RUN apk update \
&& apk upgrade \
&& apk add --no-cache bash \
&& apk add --no-cache --virtual=build-dependencies unzip \
&& apk add --no-cache curl \
&& apk add --no-cache openjdk8-jre

ENV JAVA_HOME=/opt/java/openjdk \
PATH="/opt/java/openjdk/bin:$PATH"

###3. Get ython
FROM python:3

RUN pip3 install --upgrade pip requests
RUN pip3 install python-docx tika numpy pandas

RUN mkdir scripts
RUN mkdir pdfs
RUN mkdir output

ADD runner2.py /scripts/
ADD sample.pdf .

CMD [ "python", "./scripts/runner2.py" ]

cat runner2.py:

#!/usr/bin/env python
import tika
from tika import parser
parsed = parser.from_file('sample.pdf')
print(parsed["metadata"])
print(parsed["content"])

[~/Documents/BERT_DV/Docker_Parser] $ docker run docker-tika

2020-05-08 14:40:23,183 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to /tmp/tika-server.jar.
2020-05-08 14:41:00,480 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to /tmp/tika-server.jar.md5.
2020-05-08 14:41:02,324 [MainThread ] [ERROR] Unable to run java; is it installed?
2020-05-08 14:41:02,324 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.

最佳答案

我没有评论的声誉,所以在这里发帖。

看起来,您的 Dockerfile 现在正在生成 multi-stage build , Java 不再处于最后阶段 - 之前的阶段被删除。

正如 Giga Kokaia 之前和其他人所说,Java 是问题所在。看起来你想用单个 Dockerfile 来做。例如,可以通过将 Alpine 作为基础镜像来实现,但是您需要一些额外的依赖项才能安装 Python 和所需的 pip 包。当与许多库一起使用时,Alpine 可能不是 Python 的最佳基础,因为它不使用 libc 库。然而,这里是非常粗略更新的 Dockerfile:

### 1. Get Linux
FROM alpine:3.7

### 2. Get Java via the package manager
RUN apk update \
&& apk upgrade \
&& apk add --no-cache bash \
&& apk add --no-cache --virtual=build-dependencies unzip \
&& apk add --no-cache curl \
&& apk add --no-cache openjdk8-jre \
&& apk add python3 python3-dev gcc g++ gfortran musl-dev libxml2-dev libxslt-dev

ENV JAVA_HOME=/opt/java/openjdk \
PATH="/opt/java/openjdk/bin:$PATH"


RUN pip3 install --upgrade pip requests
RUN pip3 install python-docx wheel tika numpy
RUN pip3 install pandas

RUN mkdir scripts
RUN mkdir pdfs
RUN mkdir output

ADD runner2.py /scripts/
ADD sample.pdf .

CMD [ "python3", "./scripts/runner2.py" ]

关于python - docker python 提卡,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61681495/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com