gpt4 book ai didi

performance - 为什么我的日期维度表没用? (对 PostgreSQL 存储的困惑...)

转载 作者:行者123 更新时间:2023-11-29 14:22:20 25 4
gpt4 key购买 nike

我已经检查了大约 4 次,但仍然对这些结果感到困惑。

看看下面的内容(我最初发布的 here )

日期维表--

-- Some output omitted

DROP TABLE IF EXISTS dim_calendar CASCADE;

CREATE TABLE dim_calendar (
id SMALLSERIAL PRIMARY KEY,
day_id DATE NOT NULL,
year SMALLINT NOT NULL, -- 2000 to 2024
month SMALLINT NOT NULL, -- 1 to 12
day SMALLINT NOT NULL, -- 1 to 31
quarter SMALLINT NOT NULL, -- 1 to 4
day_of_week SMALLINT NOT NULL, -- 0 () to 6 ()
day_of_year SMALLINT NOT NULL, -- 1 to 366
week_of_year SMALLINT NOT NULL, -- 1 to 53
CONSTRAINT con_month CHECK (month >= 1 AND month <= 31),
CONSTRAINT con_day_of_year CHECK (day_of_year >= 1 AND day_of_year <= 366), -- 366 allows for leap years
CONSTRAINT con_week_of_year CHECK (week_of_year >= 1 AND week_of_year <= 53),
UNIQUE(day_id)
);

INSERT INTO dim_calendar (day_id, year, month, day, quarter, day_of_week, day_of_year, week_of_year) (
SELECT ts,
EXTRACT(YEAR FROM ts),
EXTRACT(MONTH FROM ts),
EXTRACT(DAY FROM ts),
EXTRACT(QUARTER FROM ts),
EXTRACT(DOW FROM ts),
EXTRACT(DOY FROM ts),
EXTRACT(WEEK FROM ts)
FROM generate_series('2000-01-01'::timestamp, '2024-01-01', '1day'::interval) AS t(ts)
);

/* ==> [ INSERT 0 8767 ] */

测试表--

DROP TABLE IF EXISTS just_dates CASCADE;
DROP TABLE IF EXISTS just_date_ids CASCADE;

CREATE TABLE just_dates AS
SELECT a_date AS some_date
FROM some_table;

/* ==> [ SELECT 769411 ] */

CREATE TABLE just_date_ids AS
SELECT d.id
FROM just_dates jd
INNER JOIN dim_calendar d
ON d.day_id = jd.some_date;

/* ==> [ SELECT 769411 ] */

ALTER TABLE just_date_ids ADD CONSTRAINT jdfk FOREIGN KEY (id) REFERENCES dim_calendar (id);

困惑--

pocket=# SELECT pg_size_pretty(pg_relation_size('dim_calendar'));

pg_size_pretty
----------------
448 kB
(1 row)

pocket=# SELECT pg_size_pretty(pg_relation_size('just_dates'));
pg_size_pretty
----------------
27 MB
(1 row)

pocket=# SELECT pg_size_pretty(pg_relation_size('just_date_ids'));
pg_size_pretty
----------------
27 MB
(1 row)

为什么由一堆 smallint 组成的表与由一堆日期组成的表大小相同?我应该提一下,当 dim_calendar.id 是一个普通的 SERIAL 时,它给出了相同的 27MB 结果。

此外,更重要的是 -- 为什么 一个包含 769411 记录且带有单个 smallint 字段的表的大小为 27MB,这是 > 32bytes/record???

附言是的,我将拥有数十亿(或至少数亿)条记录,并且我会尽可能地添加性能和空间优化。

编辑

这可能与它有关,所以把它扔在那里 --

pocket=# select count(id) from just_date_ids group by id;
count
--------
409752
359659
(2 rows)

最佳答案

在只有一列或两列的表中,最大的部分始终是元组标题。

看这里http://www.postgresql.org/docs/current/interactive/storage-page-layout.html ,它解释了数据是如何存储的。我引用了上面页面中与您的问题最相关的部分

All table rows are structured in the same way. There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data.

这主要解释了问题

WHY does a table with 769411 records with a single smallint field have a size of 27MB, which is > 32bytes/record???

您问题的另一部分与 postgres 数据的字节对齐有关。 smallints 以 2 字节偏移量对齐,但 ints(当然还有日期......毕竟 dateint4)以 4 字节偏移量对齐。因此,表列的排列顺序起着重要作用。

拥有一个包含 smallint、date、smallint 的表需要 12 个字节用于用户数据(不计算开销),而声明 smallint、smallint、date 只需要 8 个字节。在这里看到一个很好的(令人惊讶的是没有被接受)答案 Calculating and saving space in PostgreSQL

关于performance - 为什么我的日期维度表没用? (对 PostgreSQL 存储的困惑...),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20166846/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com