gpt4 book ai didi

python - 将 BigQuery 脚本的结果返回给 Python 客户端

转载 作者:行者123 更新时间:2023-12-04 16:27:39 29 4
gpt4 key购买 nike

截至 2019 年秋季,BigQuery 支持 scripting ,这很棒。我想不通的是 Python client for BigQuery能够利用这一新功能。

例如,运行以下 Python 代码:

client = bigquery.Client()
QUERY = """
BEGIN
CREATE OR REPLACE TEMP TABLE t0 AS
SELECT * FROM my_dataset.my_table WHERE foo < 1;

SELECT SUM(bar) AS bar_sum FROM t0;

DROP TABLE IF EXISTS t0;
END;
"""

query_job = client.query(QUERY)
rows = query_job.result()

... 返回一个 google.cloud.bigquery.table._EmptyRowIterator 对象,即使我能够看到 SQL 脚本中的语句已从 BigQuery 的 Web UI 成功运行。

如何将这个标准 SQL 脚本中 SELECT 语句的结果返回给 Python 客户端?

最佳答案

支持,但您需要考虑以下 documentation :

Scripts are executed in BigQuery using jobs.insert, similar to any other query, with the multi-statement script specified as the query text. When a script executes, additional jobs, known as child jobs, are created for each statement in the script. You can enumerate the child jobs of a script by calling jobs.list, passing in the script’s job ID as the parentJobId parameter.

When jobs.getQueryResults is invoked on a script, it will return the query results for the last SELECT, DML, or DDL statement to execute in the script, with no query results if none of the above statements have executed. To obtain the results of all statements in the script, enumerate the child jobs and call jobs.getQueryResults on each of them.

例如,我修改了您的脚本以查询公共(public)表:bigquery-public-data.london_bicycles.cycle_stations .这会运行三个子作业:

enter image description here

最后一个删除表格并且不返回任何行:

enter image description here

这就是为什么,如果我运行 Python 文件,我会得到类似 <google.cloud.bigquery.table._EmptyRowIterator object at 0x7f440aa33c88> 的信息。 .

我们想要的是中间查询输出结果:

enter image description here

快速测试是注释掉 DROP语句,然后遍历行以获得 sum=6676 的结果.那么,如果我们想要中间结果呢?与前面引用的文档一样,答案是调用 jobs.list 。并将脚本作业 ID 作为 parentJobId 传递获取子作业 ID 的参数:

for job in client.list_jobs(parent_job=query_job.job_id):
print("Job ID: {}, Statement Type: {}".format(job.job_id, job.statement_type))

我们使用 list_jobs 方法和检查ID and statement type :

Job ID: script_job_80e...296_2, Statement Type: DROP_TABLE
Job ID: script_job_9a0...7fd_1, Statement Type: SELECT
Job ID: script_job_113...e13_0, Statement Type: CREATE_TABLE_AS_SELECT

请注意,后缀 (0, 1, 2) 表示执行顺序,但我们可以添加双重检查来验证作业实际上是 SELECT检索结果前的声明:

from google.cloud import bigquery

client = bigquery.Client()
QUERY = """
BEGIN
CREATE OR REPLACE TEMP TABLE t0 AS
SELECT name, bikes_count FROM `bigquery-public-data.london_bicycles.cycle_stations` WHERE bikes_count > 10;

SELECT SUM(bikes_count) AS total_bikes FROM t0;

DROP TABLE IF EXISTS t0;
END;
"""

query_job = client.query(QUERY)
query_job.result()

for job in client.list_jobs(parent_job=query_job.job_id): # list all child jobs
# print("Job ID: {}, Statement Type: {}".format(job.job_id, job.statement_type))
if job.statement_type == "SELECT": # print the desired job output only
rows = job.result()
for row in rows:
print("sum={}".format(row["total_bikes"]))

输出:

sum=6676

关于python - 将 BigQuery 脚本的结果返回给 Python 客户端,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59806304/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com