gpt4 book ai didi

google-bigquery - bigquery网址解码

转载 作者:行者123 更新时间:2023-12-03 19:11:16 33 4
gpt4 key购买 nike

有没有一种简单的方法可以在BigQuery查询语言中进行URL解码?我正在使用一个表,该表的列包含一些值的URL编码的字符串。例如:

http://xyz.com/example.php?url=http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345&foo=bar&abc=xyz

我像这样提取“url”参数:
SELECT REGEXP_EXTRACT(column_name, "url=([^&]+)") as url 
from [mydataset.mytable]

这给了我:
http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345

我想做的是这样的:
SELECT URL_DECODE(REGEXP_EXTRACT(column_name, "url=([^&]+)")) as url 
from [mydataset.mytable]

从而返回:
http://www.example.com/hello?v=12345

如果可能,我想避免使用多个REGEXP_REPLACE()语句(替换%20,%3A等)。

有想法吗?

最佳答案

下面是在@sigpwned答案的基础上构建的,但是使用SQL UDF进行了稍微的重构和包装(没有限制,JS UDF使用起来很安全)



#standardSQL
CREATE TEMP FUNCTION URLDECODE(url STRING) AS ((
SELECT SAFE_CONVERT_BYTES_TO_STRING(
ARRAY_TO_STRING(ARRAY_AGG(
IF(STARTS_WITH(y, '%'), FROM_HEX(SUBSTR(y, 2)), CAST(y AS BYTES)) ORDER BY i
), b''))
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r"%[0-9a-fA-F]{2}|[^%]+")) AS y WITH OFFSET AS i
));
SELECT
column_name,
URLDECODE(REGEXP_EXTRACT(column_name, "url=([^&]+)")) AS url
FROM `project.dataset.table`

可以用下面的问题示例进行测试

#standardSQL
CREATE TEMP FUNCTION URLDECODE(url STRING) AS ((
SELECT SAFE_CONVERT_BYTES_TO_STRING(
ARRAY_TO_STRING(ARRAY_AGG(
IF(STARTS_WITH(y, '%'), FROM_HEX(SUBSTR(y, 2)), CAST(y AS BYTES)) ORDER BY i
), b''))
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r"%[0-9a-fA-F]{2}|[^%]+")) AS y WITH OFFSET AS i
));
WITH `project.dataset.table` AS (
SELECT 'http://example.com/example.php?url=http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345&foo=bar&abc=xyz' column_name
)
SELECT
URLDECODE(REGEXP_EXTRACT(column_name, "url=([^&]+)")) AS url,
column_name
FROM `project.dataset.table`

结果

Row url                                     column_name  
1 http://www.example.com/hello?v=12345 http://example.com/example.php?url=http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345&foo=bar&abc=xyz

Update with further quite optimized SQL UDF



CREATE TEMP FUNCTION URLDECODE(url STRING) AS ((
SELECT STRING_AGG(
IF(REGEXP_CONTAINS(y, r'^%[0-9a-fA-F]{2}'),
SAFE_CONVERT_BYTES_TO_STRING(FROM_HEX(REPLACE(y, '%', ''))), y), ''
ORDER BY i
)
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r"%[0-9a-fA-F]{2}(?:%[0-9a-fA-F]{2})*|[^%]+")) y
WITH OFFSET AS i
));

关于google-bigquery - bigquery网址解码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13831391/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com