gpt4 book ai didi

sql - 使用 ORDER、LIMIT 和 IN 谓词查询的 Cassandra 表设计

转载 作者:行者123 更新时间:2023-12-04 12:38:57 27 4
gpt4 key购买 nike

我有如下数据:

select * from test;

department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 1 | hash1
dep1 | Bart | 1 | hash2
dep1 | Lisa | 3 | hash3
dep1 | Lisa | 4 | hash4
dep1 | John | 5 | hash5
dep1 | Lucy | 6 | hash6
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8

我想用 where 查询数据关于 batch_number 的条款, 一个 orderingbatch_number和一个 in谓词于 employee .

在关系数据库中,这看起来像
select * from test 
where department='dep1'
and employee in ('Bart','Lucy','John')
and batch_number >= 2
order by batch_number desc
limit 3;

department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
dep1 | Lucy | 6 | hash6

我在 Cassandra 中为此查询建模时遇到了一些问题。 department将是我的分区键和 hash需要成为主键的一部分。但是我正在为集群键和/或((SSTable 附加)辅助)索引而苦苦挣扎。

因为我想在 batch_number 上订购我尝试将其作为集群键包含在内:
CREATE TABLE keyspace.test(
department TEXT,
batch_number INT,
hash TEXT,
employee TEXT,
PRIMARY KEY ((department), batch_number, hash)
) WITH CLUSTERING ORDER BY (batch_number DESC);
CREATE INDEX tst_emp ON keyspace.test (employee);

但这不允许带有 in 的查询我的索引的谓词:
select * from keyspace.test where department='dep1' and employee in ('Bart','Lucy','John');
InvalidRequest: Error from server: code=2200 [Invalid query] message="IN predicates on non-primary-key columns (employee) is not yet supported"

所以我尝试添加 employee列也作为集群键:
CREATE TABLE keyspace.test(
department TEXT,
batch_number INT,
hash TEXT,
employee TEXT,
PRIMARY KEY ((department), batch_number, hash, employee)
) WITH CLUSTERING ORDER BY (batch_number DESC);

但这失败了,因为我不能在 batch_number 上放置非 EQ 关系。 :
select * from keyspace.test where department='dep1' and batch_number > 1 and employee in ('Bart','Lucy','John');
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering column "employee" cannot be restricted (preceding column "batch_number" is restricted by a non-EQ relation)"

但每当我把 employee之前 batch_number我无法在 batch_number 上订购:
CREATE TABLE keyspace.test(
department TEXT,
employee TEXT,
batch_number INT,
hash TEXT,
PRIMARY KEY ((department), employee, batch_number, hash)
);

select * from keyspace.test where department='dep1' and employee in ('Bart','Lucy','John') ORDER BY batch_number DESC;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"

那么什么样的表设计会允许这样的查询呢?
这可以在 Cassandra 中完成吗?

编辑:

我希望能够在此表上运行的其他查询是:
select * from keyspace.test where department='X' and batch_number=Y 


delete from keyspace.test where department='X'

最佳答案

使用允许您重新排列数据的物化 View :

CREATE MATERIALIZED VIEW mv_test AS 
SELECT
department,
batch_number,
employee,
hash
FROM
test
WHERE
department IS NOT NULL
AND batch_number IS NOT NULL
AND employee IS NOT NULL
AND hash IS NOT NULL
PRIMARY KEY (department, employee, batch_number, hash)
WITH clustering
ORDER BY
(batch_number DESC);

我可以执行以下查询:
SELECT * FROM mv_test 
WHERE
department = 'dep1'
AND employee IN
(
'Bart',
'Lisa'
)
AND batch_number > 3;

结果按聚类顺序排序:
 department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
dep1 | Lisa | 4 | hash4

虽然 >子句是不等式子句, IN ,虽然有多个值,但仍然是确定性的,这就是为什么我相信你可以毫无问题地过滤键。
batch_number是您要过滤的最后一件事,任何类型的 where 子句都是允许的。我假设你总是有 department .

请注意物化 View impact performance .更具体地说,写入性能。然而,与 ALLOW FILTERING 相比,读取性能是有益的。 .

更新:

实体化 View 末尾指定的顺序是 batch_number ,但是,它将首先在 department 上订购,然后 employee ,然后 batch_number ,所以 batch_number 的顺序具体不保证。据我所知,没有办法解决这个问题。另一种数据库解决方案可能更可取。

更新 2:

正如 Apache 邮件链中所述(见下面的评论),物化 View 不被视为生产就绪。但是,Datastax 认为它们是可用的,前提是它们使用提到的最佳实践加以照顾。就个人而言,我对物化 View 没有任何问题。当然,对于一个简单的单个数据中心集群,考虑到最佳实践提到了更复杂的设置,它们可能会在这种情况下中断。

关于sql - 使用 ORDER、LIMIT 和 IN 谓词查询的 Cassandra 表设计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53223715/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com