gpt4 book ai didi

python - 数据流中的值错误 : Invalid GCS location: None

转载 作者:行者123 更新时间:2023-12-05 02:40:59 26 4
gpt4 key购买 nike

我正在尝试从 GCS 存储桶加载数据并将内容发布到 pubsub 和 bigquery。这些是我的管道选项:

options = PipelineOptions(
project = project,
temp_location = "gs://dataflow-example-bucket6721/temp21/",
region = 'us-east1',
job_name = "dataflow2-pubsub-09072021",
machine_type = 'e2-standard-2',
)

这是我的管道

data = p | 'CreateData' >> beam.Create(sum([fileName()], []))

jsonFile = data | "filterJson" >> beam.Filter(filterJsonfile)

JsonData = jsonFile | "JsonData" >> beam.Map(readFromJson)

split_data = JsonData | 'Split Data' >> ParDo(CheckForValidData()).with_outputs("ValidData", "InvalidData")

ValidData = split_data.ValidData
InvalidData = split_data.InvalidData
data_ = split_data[None]


publish_data = ValidData | "Publish msg" >> ParDo(publishMsg())

ToBQ = ValidData | "To BQ" >> beam.io.WriteToBigQuery(
table_spec,
#schema=table_schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)

数据在 InteractiveRunner 中运行良好,但在 DataflowRunner 中显示错误

ValueError: Invalid GCS location: None.Writing to BigQuery with FILE_LOADS method requires a GCS location to be provided to write files to be loaded into BigQuery. Please provide a GCS bucket through custom_gcs_temp_location in the constructor of WriteToBigQuery or the fallback option --temp_location, or pass method="STREAMING_INSERTS" to WriteToBigQuery. [while running '[15]: To BQ/BigQueryBatchFileLoads/GenerateFilePrefix']

显示 GCS 位置错误,建议添加 temp_location。但我已经添加了 temp_location。

最佳答案

运行 Dataflow 管道时传递参数 --temp_location gs://bucket/subfolder/(完全采用这种格式,在存储桶中创建一个子文件夹)并且应该可以工作。

关于python - 数据流中的值错误 : Invalid GCS location: None,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68325195/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com