gpt4 book ai didi

docker - 在 Docker 中运行 Kedro 管道时找不到 PartitionedDataSet

转载 作者:行者123 更新时间:2023-12-02 18:17:31 24 4
gpt4 key购买 nike

我在我阅读和处理的 S3 存储桶中有多个文本文件。因此,我在 Kedro 数据目录中定义了 PartitionedDataSet,如下所示:

raw_data:
type: PartitionedDataSet
path: s3://reads/raw
dataset: pandas.CSVDataSet
load_args:
sep: "\t"
comment: "#"
另外,我实现了这个 solution通过环境变量(包括 AWS key )从凭证文件中获取所有 secret 。
当我使用 kedro run 在本地运行时一切正常,但是当我构建 Docker 镜像(使用 kedro-docker )并使用 kedro docker run 在 Docker 环境中运行管道时并使用 --docker-args 提供所有环境变量选项我收到以下错误:
Traceback (most recent call last):
File "/usr/local/bin/kedro", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 724, in main
cli_collection()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/kedro/kedro_cli.py", line 230, in run
pipeline_name=pipeline,
File "/usr/local/lib/python3.7/site-packages/kedro/framework/context/context.py", line 767, in run
raise exc
File "/usr/local/lib/python3.7/site-packages/kedro/framework/context/context.py", line 759, in run
run_result = runner.run(filtered_pipeline, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 101, in run
self._run(pipeline, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/sequential_runner.py", line 90, in _run
run_node(node, catalog, self._is_async, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 213, in run_node
node = _run_node_sequential(node, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 221, in _run_node_sequential
inputs = {name: catalog.load(name) for name in node.inputs}
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 221, in <dictcomp>
inputs = {name: catalog.load(name) for name in node.inputs}
File "/usr/local/lib/python3.7/site-packages/kedro/io/data_catalog.py", line 392, in load
result = func()
File "/usr/local/lib/python3.7/site-packages/kedro/io/core.py", line 213, in load
return self._load()
File "/usr/local/lib/python3.7/site-packages/kedro/io/partitioned_data_set.py", line 240, in _load
raise DataSetError("No partitions found in `{}`".format(self._path))
kedro.io.core.DataSetError: No partitions found in `s3://reads/raw`
注意:Pipeline 在 Docker 环境中工作得很好,如果我将文件移动到某个本地目录,定义 PartitionedDataSet 并构建 Docker 镜像并通过 --docker-args 提供环境变量

最佳答案

解决方案(至少在我的情况下)是提供 AWS_DEFAULT_REGION kedro docker run 中的 env 变量命令。

关于docker - 在 Docker 中运行 Kedro 管道时找不到 PartitionedDataSet,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64006314/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com