gpt4 book ai didi

Python BigQuery allowLargeResults 与 pandas.io.gbq

转载 作者:太空狗 更新时间:2023-10-29 20:21:23 25 4
gpt4 key购买 nike

我想使用 Pandas library to read BigQuery数据。如何获得较大的结果?
对于非 Pandas BigQuery 交互,这可以像 this 这样实现.

当前使用 Pandas 的代码:

sProjectID = "project-id"
sQuery = '''
SELECT
column1, column2
FROM [dataset_name.tablename]
'''
from pandas.io import gbq
df = gbq.read_gbq(sQuery, sProjectID)

最佳答案

编辑:我已经在我的其他答案中发布了执行此操作的正确方法;首先删除谷歌存储中的数据。这样您就不会拥有太大的数据。


好吧,我没有找到用 pandas 做的直接方法,所以我不得不用普通的 API 写一些额外的东西。这是我的修复方法(也是在没有 Pandas 的情况下本地完成的大部分工作):

sProjectID = "project-id"
sQuery = '''
SELECT
column1, column2
FROM [dataset_name.tablename]
'''

df = create_dataframe(sQuery, sProjectID, bLargeResults=True)


#*******Functions to make above work*********



def create_dataframe(sQuery, sProjectID, bLargeResults=False):
"takes a BigQuery sql query and returns a Pandas dataframe"

if bLargeResults:
oService = create_service()
dDestinationTable = run_query(sQuery, oService, sProjectID)
df = pandas_get_table(dDestinationTable)
else:
df = pandas_query(sQuery, sProjectID)

return df



def pandas_query(sQuery, sProjectID):
"go into bigquery and get the table with sql query and return dataframe"
from pandas.io import gbq
df = gbq.read_gbq(sQuery, sProjectID)

return df



def pandas_get_table(dTable):
"fetch a table and return dataframe"
from pandas.io import gbq

sProjectID = dTable['projectId']
sDatasetID = dTable['datasetId']
sTableID = dTable['tableId']
sQuery = "SELECT * FROM [{}.{}]".format(sDatasetID, sTableID)

df = gbq.read_gbq(sQuery, sProjectID)

return df




def create_service():
"create google service"
from oauth2client.client import GoogleCredentials
from apiclient.discovery import build
credentials = GoogleCredentials.get_application_default()
oService = build('bigquery', 'v2', credentials=credentials)
return oService



def run_query(sQuery, oService, sProjectID):
"runs the bigquery query"

dQuery = {
'configuration': {
'query': {
'writeDisposition': 'OVERWRITE',
'useQueryCache': False,
'allowLargeResults': True,
'query': sQuery,
'destinationTable': {
'projectId': sProjectID,
'datasetId': 'sandbox',
'tableId': 'api_large_result_dropoff',
},
}
}
}

job = oService.jobs().insert(projectId=sProjectID, body=dQuery).execute()


return job['configuration']['query']['destinationTable']

关于Python BigQuery allowLargeResults 与 pandas.io.gbq,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34201923/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com