gpt4 book ai didi

python-3.x - "pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data"在没有架构的情况下向 BigQuery 发送数据时

转载 作者:行者123 更新时间:2023-12-03 23:09:21 24 4
gpt4 key购买 nike

我正在编写一个向 BigQuery 发送数据帧的脚本:

load_job = bq_client.load_table_from_dataframe(
df, '.'.join([PROJECT, DATASET, PROGRAMS_TABLE])
)

# Wait for the load job to complete
return load_job.result()

这工作正常,但前提是已经在 BigQuery 中定义了架构,或者我在脚本中定义了我的工作架构。如果未定义架构,则会出现以下错误:
Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 1661, in load_table_from_dataframe dataframe.to_parquet(tmppath, compression=parquet_compression) File "/env/local/lib/python3.7/site-packages/pandas/core/frame.py", line 2237, in to_parquet **kwargs File "/env/local/lib/python3.7/site-packages/pandas/io/parquet.py", line 254, in to_parquet **kwargs File "/env/local/lib/python3.7/site-packages/pandas/io/parquet.py", line 117, in write **kwargs File "/env/local/lib/python3.7/site-packages/pyarrow/parquet.py", line 1270, in write_table writer.write_table(table, row_group_size=row_group_size) File "/env/local/lib/python3.7/site-packages/pyarrow/parquet.py", line 426, in write_table self.writer.write_table(table, row_group_size=row_group_size) File "pyarrow/_parquet.pyx", line 1311, in pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1578661876547574000 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 383, in run_background_function _function_handler.invoke_user_function(event_object) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 217, in invoke_user_function return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 214, in call_user_function event_context.Context(**request_or_event.context)) File "/user_code/main.py", line 151, in main df = df(param1, param2) File "/user_code/main.py", line 141, in get_df df, '.'.join([PROJECT, DATASET, PROGRAMS_TABLE]) File "/env/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 1677, in load_table_from_dataframe os.remove(tmppath) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp_ps5xji9_job_634ff274.parquet'

为什么是 pyarrow产生这个错误?除了预定义模式之外,我如何解决它?

最佳答案

从 Pandas 转换为 Arrow 或 Parquet 时的默认行为是不允许无声数据丢失。进行转换时可以设置选项以允许不安全的强制转换,从而导致时间戳精度丢失或其他形式的数据丢失。 BigQuery Python API 需要设置这些选项,因此它可能是 BigQuery 库中的一个错误。我建议报告他们的问题跟踪器 https://github.com/googleapis/google-cloud-python

关于python-3.x - "pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data"在没有架构的情况下向 BigQuery 发送数据时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59682833/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com