gpt4 book ai didi

python - 无法在 Python 中生成模板化数据流

转载 作者:太空宇宙 更新时间:2023-11-03 14:14:14 24 4
gpt4 key购买 nike

我正在尝试通过修改 pipeline options 将 Cloud Dataflow“Wordcount”python 示例转换为模板化版本使用运行时参数 instructed in the docs :

def run(argv=None):
"""Main entry point; defines and runs the wordcount pipeline."""

class WordcountTemplatedOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
# Use add_value_provider_argument for arguments to be templatable
# Use add_argument as usual for non-templatable arguments
parser.add_value_provider_argument(
'--input',
default='gs://dataflow-samples/shakespeare/kinglear.txt',
help='Path of the file to read from')
parser.add_argument(
'--output',
required=True,
help='Output file to write results to.')
pipeline_options = PipelineOptions(['--output', 'some/output_path'])
p = beam.Pipeline(options=pipeline_options)
wordcount_options = pipeline_options.view_as(WordcountTemplatedOptions)

# Read the text file[pattern] into a PCollection.
etc. etc.

问题是在执行 command 时创建和暂存模板... ,输出为:

INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.288088083267 seconds
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Starting finalize_write threads with num_shards: 1, batches: 1, num_threads: 1
INFO:root:Renamed 1 shards in 0.13 seconds.
INFO:root:number of empty lines: 1663
INFO:root:average word length: 4

并且 template_location (gs://[YOUR_BUCKET_NAME]/templates/mytemplate) 下没有生成的文件...

我认为该命令试图使用“默认”输入文件从桌面执行数据流,因此我删除了 --input 参数中的“默认”行,但出现了此错误:

raise BeamIOError('Unable to get the Filesystem', {path: e})
apache_beam.io.filesystem.BeamIOError: Unable to get the Filesystem with exceptions {None: AttributeError("'NoneType' object has no attribute 'strip'",)}

没有官方的 python 数据流模板化示例(我能找到的唯一片段是 this one ,它看起来与上面的非常相似)。

我错过了什么吗?

谢谢!

最佳答案

感谢 Google Cloud 支持 - 我能够解决该问题。总结一下:

  1. 克隆最新的 wordcount.py 示例(我使用过旧版本):

    git 克隆 https://github.com/apache/beam.git

  2. Google 团队 updated the tutorial ,因此只需按照代码说明操作即可。确保包含 @classmethod _add_argparse_args 以便能够在运行时接收参数,并在从文本文件读取时使用新选项:

    wordcount_options = pipeline_options.view_as(WordcountTemplatedOptions)线= p | '读' >> ReadFromText(wordcount_options.input)

  3. 生成模板为 instructed

您现在应该在 template_location 目录下看到模板

谢谢!

关于python - 无法在 Python 中生成模板化数据流,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48308693/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com