gpt4 book ai didi

python - Google Dataflow - 无法导入自定义 python 模块

转载 作者:行者123 更新时间:2023-11-30 22:11:58 26 4
gpt4 key购买 nike

我的 Apache Beam 管道实现了自定义 Transforms 和 ParDo 的 Python 模块,这些模块进一步导入我编写的其他模块。在本地运行器上,这工作正常,因为所有可用文件都在同一路径中可用。对于数据流运行程序,管道失败并出现模块导入错误。

如何使自定义模块可供所有数据流工作人员使用?请指教。

下面是一个例子:

ImportError: No module named DataAggregation

at find_class (/usr/lib/python2.7/pickle.py:1130)
at find_class (/usr/local/lib/python2.7/dist-packages/dill/dill.py:423)
at load_global (/usr/lib/python2.7/pickle.py:1096)
at load (/usr/lib/python2.7/pickle.py:864)
at load (/usr/local/lib/python2.7/dist-packages/dill/dill.py:266)
at loads (/usr/local/lib/python2.7/dist-packages/dill/dill.py:277)
at loads (/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py:232)
at apache_beam.runners.worker.operations.PGBKCVOperation.__init__ (operations.py:508)
at apache_beam.runners.worker.operations.create_pgbk_op (operations.py:452)
at apache_beam.runners.worker.operations.create_operation (operations.py:613)
at create_operation (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:104)
at execute (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:130)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:642)

最佳答案

问题可能是您没有将文件分组为一个包。 Beam 文档有 a section就在上面。

Multiple File Dependencies

Often, your pipeline code spans multiple files. To run your project remotely, you must group these files as a Python package and specify the package when you run your pipeline. When the remote workers start, they will install your package. To group your files as a Python package and make it available remotely, perform the following steps:

  1. Create a setup.py file for your project. The following is a very basic setup.py file.

    setuptools.setup(
    name='PACKAGE-NAME'
    version='PACKAGE-VERSION',
    install_requires=[],
    packages=setuptools.find_packages(),
    )
  2. Structure your project so that the root directory contains the setup.py file, the main workflow file, and a directory with the rest of the files.

    root_dir/
    setup.py
    main.py
    other_files_dir/

See Juliaset for an example that follows this required project structure.

  1. Run your pipeline with the following command-line option:

    --setup_file /path/to/setup.py

Note: If you created a requirements.txt file and your project spans multiple files, you can get rid of the requirements.txt file and instead, add all packages contained in requirements.txt to the install_requires field of the setup call (in step 1).

关于python - Google Dataflow - 无法导入自定义 python 模块,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51262031/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com