gpt4 book ai didi

kubernetes - 如何在App Engine或Kubernetes上运行NLTK?

转载 作者:行者123 更新时间:2023-12-02 12:05:49 24 4
gpt4 key购买 nike

我正忙于编写模型来预测pdf文档中的文本类型,例如名称或日期。

该模型使用nltk.word_tokenize和nltk.pos_tag

当我尝试在Google Cloud Platform的Kubernetes上使用它时,出现以下错误:

    from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize

tokenized_word = tokenize_word('x')
tagges_word = pos_tag(['x'])

堆栈跟踪:
      Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:

>>> import nltk
>>> nltk.download('punkt')

Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/env/nltk_data'
- '/env/share/nltk_data'
- '/env/lib/nltk_data'
- ''

但是显然,如果必须在Kubernetes上运行它,将其下载到本地设备将无法解决问题,并且我们尚未在项目上设置NFS。

最佳答案

我最终解决此问题的方法是在 init 函数中添加nltk软件包的下载

import logging
import nltk
from nltk import word_tokenize, pos_tag

LOGGER = logging.getLogger(__name__)

LOGGER.info('Catching broad nltk errors')
DOWNLOAD_DIR = '/usr/lib/nltk_data'
LOGGER.info(f'Saving files to {DOWNLOAD_DIR} ')

try:
tokenized = word_tokenize('x')
LOGGER.info(f'Tokenized word: {tokenized}')
except Exception as err:
LOGGER.info(f'NLTK dependencies not downloaded: {err}')
try:
nltk.download('punkt', download_dir=DOWNLOAD_DIR)
except Exception as e:
LOGGER.info(f'Error occurred while downloading file: {e}')

try:
tagged_word = pos_tag(['x'])
LOGGER.info(f'Tagged word: {tagged_word}')
except Exception as err:
LOGGER.info(f'NLTK dependencies not downloaded: {err}')
try:
nltk.download('averaged_perceptron_tagger', download_dir=DOWNLOAD_DIR)
except Exception as e:
LOGGER.info(f'Error occurred while downloading file: {e}')

我意识到不需要大量的try catch表达式。我还指定了下载目录,因为如果您不这样做,它似乎会将“tagger”下载并解压缩到/ usr / lib,而nltk不会在其中查找文件。

这将在新的Pod上的每次首次运行时下载文件,并且这些文件将一直保留到Pod死掉为止。

该错误已在Kubernetes无状态集上解决,这意味着它可以处理非持久性应用程序(如App Engine),但效率不是最高,因为每次实例启动时都需要下载。

关于kubernetes - 如何在App Engine或Kubernetes上运行NLTK?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53304958/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com