gpt4 book ai didi

google-bigquery - 大查询 : Get size of each row in table

转载 作者:行者123 更新时间:2023-12-05 02:46:38 25 4
gpt4 key购买 nike

在 BigQuery 中,对于查询,我收到以下错误消息:

Cannot query rows larger than 100MB limit.

我了解限制,但是,我想对此进行更多调试并搜索大于 100MB 的行。

有人知道 BigQuery 中是否存在函数(或者是否有其他方法)来获取表中每一行的大小?

最佳答案

行的大小由各个值的数据类型定义。您可以添加静态列(例如数字)并使用函数添加动态大小数据(例如字符串)的大小。

在列的子集上执行此操作(我假设错误出现在行由于 JOIN 变大时),然后您可以找到异常大的行并相应地处理它们。

数据类型大小

Data type             Size
INT64/INTEGER 8 bytes
FLOAT64/FLOAT 8 bytes
NUMERIC 16 bytes
BIGNUMERIC (Preview) 32 bytes
BOOL/BOOLEAN 1 byte
STRING 2 bytes + the UTF-8 encoded string size
BYTES 2 bytes + the number of bytes in the value
DATE 8 bytes
DATETIME 8 bytes
TIME 8 bytes
TIMESTAMP 8 bytes
STRUCT/RECORD 0 bytes + the size of the contained fields
GEOGRAPHY 16 bytes + 24 bytes * the number of vertices in the geography type (you can verify the number of vertices using the ST_NumPoints function)

Null values for any data type are calculated as 0 bytes.

A repeated column is stored as an array, and the size is calculated based on the number of values. For example, an integer column (INT64) that is repeated (ARRAY) and contains 4 entries is calculated as 32 bytes (4 entries x 8 bytes).

来源:Data size calculation - BigQuery Documentation

类似问题:How many bytes in BigQuery types (StackOverflow)

示例

当您有 2 个字符串列、1 个数字列和 1 个日期时间列时的示例:

SELECT 2 + BYTE_LENGTH(string_column1) 
+ 2 + BYTE_LENGTH(string_column2)
+ 16 -- NUMERIC -> 16 Bytes
+ 8 -- DATETIME -> 8 Bytes
AS ROW_SIZE
FROM `project-name.dataset-name.table-name`

来源:String Byte Length Calculation

关于google-bigquery - 大查询 : Get size of each row in table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65405969/

25 4 0