gpt4 book ai didi

python - 使用 pandas to_gbq 读取数据在 Google BigQuery 中创建表时出现 400 错误

转载 作者:太空宇宙 更新时间:2023-11-03 10:51:26 27 4
gpt4 key购买 nike

我正在尝试从 MySQL 服务器查询数据并使用 pandas .to_gbq api 将其写入 Google BigQuery。

def production_to_gbq(table_name_prod,prefix,table_name_gbq,dataset,project):
# Extract data from Production

q = """
SELECT *
FROM
{}
""".format(table_name_prod)

df = pd.read_sql(q, con)

# Write to gbq
df.to_gbq(dataset + table_name_gbq, project, chunksize=1000, verbose=True, reauth=False, if_exists='replace', private_key=None)

return df

我不断收到指示无效输入的 400 错误。

Load is 100.0% Complete
---------------------------------------------------------------------------
BadRequest Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize, schema)
569 self.client, dataframe, dataset_id, table_id,
--> 570 chunksize=chunksize):
571 self._print("\rLoad is {0}% Complete".format(

/usr/local/lib/python3.6/site-packages/pandas_gbq/_load.py in load_chunks(client, dataframe, dataset_id, table_id, chunksize, schema)
73 destination_table,
---> 74 job_config=job_config).result()

/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
527 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528 return super(_AsyncJob, self).result(timeout=timeout)
529

/usr/local/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
110 # Pylint doesn't recognize that this is valid in this case.
--> 111 raise self._exception
112

BadRequest: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 10; errors: 1. Please look into the error stream for more details.

During handling of the above exception, another exception occurred:

GenericGBQException Traceback (most recent call last)
<ipython-input-73-ef9c7cec0104> in <module>()
----> 1 departments.to_gbq(dataset + table_name_gbq, project, chunksize=1000, verbose=True, reauth=False, if_exists='replace', private_key=None)
2

/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in to_gbq(self, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
1058 return gbq.to_gbq(self, destination_table, project_id=project_id,
1059 chunksize=chunksize, verbose=verbose, reauth=reauth,
-> 1060 if_exists=if_exists, private_key=private_key)
1061
1062 @classmethod

/usr/local/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
107 chunksize=chunksize,
108 verbose=verbose, reauth=reauth,
--> 109 if_exists=if_exists, private_key=private_key)

/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver, table_schema)
980 connector.load_data(
981 dataframe, dataset_id, table_id, chunksize=chunksize,
--> 982 schema=table_schema)
983
984

/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize, schema)
572 ((total_rows - remaining_rows) * 100) / total_rows))
573 except self.http_error as ex:
--> 574 self.process_http_error(ex)
575
576 self._print("\n")

/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
453 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
454
--> 455 raise GenericGBQException("Reason: {0}".format(ex))
456
457 def run_query(self, query, **kwargs):

GenericGBQException: Reason: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 10; errors: 1. Please look into the error stream for more details.

我调查了表架构,

id  INTEGER NULLABLE    
name STRING NULLABLE
description STRING NULLABLE
created_at INTEGER NULLABLE
modified_at FLOAT NULLABLE

它与数据框相同:

id                        int64
name object
description object
created_at int64
modified_at float64

表是在 GBQ 中创建的,但仍然是空的。

我阅读了一些内容,但在 pandas.to_gbq api 上没有找到太多内容,除了这个似乎相关但没有回复:

bigquery table is empty when using pandas to_gbq

我找到了一个潜在的解决方案,解决了对象数据类型中的数字被不带引号传递到 GBQ 表的问题,通过将列数据类型设置为字符串来解决这个问题。

I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

我尝试了修复:

for col in df.columns:
if df[col].dtypes == object:
df[col] = df[col].fillna('')
df[col] = df[col].astype(str)

不幸的是,我仍然遇到同样的错误。同样,尝试格式化丢失的数据并为 int 和 float 设置数据类型也会产生相同的错误。

有什么我想念的吗?

最佳答案

发现bigquery无法正确处理\r(有时\n也是)有同样的问题,定位问题,当我用空格替换 \r 修复它时我真的很惊讶:

for col in list(df.columns):
df[col] = df[col].apply(lambda x: x.replace(u'\r', u' ') if isinstance(x, str) or isinstance(x, unicode) else x)

关于python - 使用 pandas to_gbq 读取数据在 Google BigQuery 中创建表时出现 400 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49628612/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com