gpt4 book ai didi

pandas 到 gbq 声称架构不匹配,而架构完全相同。在 github 上,所有问题都声称已于 2017 年解决

转载 作者:行者123 更新时间:2023-12-02 06:56:43 27 4
gpt4 key购买 nike

我试图通过 pandas 将一个表附加到另一个表,从 BigQuery 中提取数据并将其发送到另一个 BigQuery 数据集。虽然表架构完全相同,但我收到错误““请验证结构和”pandas_gbq.gbq.InvalidSchema:请验证DataFrame中的结构和数据类型是否与目标表的架构匹配。”

这个错误发生在我之前进行表覆盖时,但在这种情况下数据集太大而无法做到这一点(并且这不是一个可持续的解决方案)。


df = pd.read_gbq(query, project_id="my-project", credentials=bigquery_key,
dialect='standard')
pd.io.gbq.to_gbq(df, dataset, projectid,
if_exists='append',
table_schema=[{'name': 'Date','type': 'STRING'},
{'name': 'profileId','type': 'STRING'},
{'name': 'Opco','type': 'STRING'},
{'name': 'country','type': 'STRING'},
{'name': 'deviceType','type': 'STRING'},
{'name': 'userType','type': 'STRING'},
{'name': 'users','type': 'INTEGER'},
{'name': 'sessions','type': 'INTEGER'},
{'name': 'bounceRate','type': 'FLOAT'},
{'name': 'sessionsPerUser','type': 'FLOAT'},
{'name': 'avgSessionDuration','type': 'FLOAT'},
{'name': 'pageviewsPerSession','type': 'FLOAT'}
],
credentials=bigquery_key)

BigQuery 中的架构如下:

Date                STRING      
profileId STRING
Opco STRING
country STRING
deviceType STRING
userType STRING
users INTEGER
sessions INTEGER
bounceRate FLOAT
sessionsPerUser FLOAT
avgSessionDuration FLOAT
pageviewsPerSession FLOAT

然后我收到以下错误:

Traceback (most recent call last):   File "..file.py", line 63, in
<module>
main()
File "..file.py", line 57, in main
updating_general_data(bigquery_key)
File "..file.py", line 46, in updating_general_data
credentials=bigquery_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\gbq.py",
line 162, in to_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas_gbq\gbq.py",
line 1141, in to_gbq
"Please verify that the structure and " pandas_gbq.gbq.InvalidSchema: Please verify that the structure and
data types in the DataFrame match the schema of the destination table.

对我来说,似乎是一对一的比赛。我见过其他线程讨论这个问题,这些线程主要讨论日期格式,即使在这种情况下日期格式已经是字符串,然后 table_schema 仍然作为字符串。

最佳答案

对此的最终“解决方法”是不要手动指定架构,因为手动指定架构总是容易出现类型转换/命名错误,最好从表中获取架构。因此,让客户端使用最新版本的 API:

from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'credentials.json')
project_id = 'your_project_id',
client = bigquery.Client(credentials= credentials,project=project_id)

获取您要写入/附加到的表:

table = client.get_table('your_dataset.your_table')
table

从表生成架构:

generated_schema = [{'name':i.name, 'type':i.field_type} for i in table.schema]
generated_schema

相应地重命名您的数据框:

data.columns = [i.name for i in table.schema]

传递相同的架构,同时将其推送到 BigQuery:

data.to_gbq(project_id = 'your_project_id',
destination_table = 'your_dataset.your_table',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
table_schema = generated_schema,
progress_bar = True,
if_exists = 'replace')

关于pandas 到 gbq 声称架构不匹配,而架构完全相同。在 github 上,所有问题都声称已于 2017 年解决,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56545738/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com