Partition key is used to distribute row to nodes. If I only use one single node, should I worry about the warning of comparing on partition key?
分区键用于将行分发到节点。如果我只使用一个节点,我是否应该担心比较分区键的警告?
Moreover, in my user case, I need to query on partition key by both equation and comparison < >. Any suggestion on how to design the primary key?
此外,在我的用户案例中,我需要同时使用公式和比较<>来查询分区键。对如何设计主键有什么建议吗?
更多回答
- If you ran Cassandra on a single node you loose almost all the advantages it has: scaling (both in throughput and volume) as much as you need, resiliency. For a dev/test server it is ok but if you are thinking about production 3 nodes is a minimum (to get proper consistency level).
You seem to need quick tutorials to get you going :
你似乎需要快速教程让你去:
Cédrick makes a good point. Your cluster should be at least 2-3 nodes with the replication factor set to the number of nodes. Then each node will still have all of the data, but the cluster will have some resiliency to hardware failure.
塞德里克说得很好。您的集群应至少包含2-3个节点,并且复制系数应设置为节点数。然后,每个节点仍将拥有所有数据,但集群将对硬件故障具有一定的恢复能力。
should I worry about the warning of comparing on partition key?
Not really, no. Although if the cluster grows to the point where someday it is big enough where not all data exists on all nodes, you'll be glad that you did follow the standard Cassandra modeling advice.
不完全是,不是。尽管如果集群增长到某一天足够大,并且不是所有数据都存在于所有节点上,您会很高兴您遵循了标准的Cassandra建模建议。
in my user case, I need to query on partition key by both equation and comparison < >. Any suggestion on how to design the primary key?
Without seeing the queries that you need to support, that's going to be difficult. I'd say that it's more of a question of whether or not this data set is actually going to grow beyond a size of 1 to 3 nodes.
如果没有看到您需要支持的查询,这将是很困难的。我想说,这更多的是一个问题,这个数据集是否真的会增长到超过1到3个节点的大小。
If it is user data and it will grow, then I would find something to partition the users by. Something where it makes sense according to the business case. For example, if this was for employee data, and I knew that all result sets would be for employees in the same department, then perhaps department makes for a good partition key. Then I'd add user ID and user name to round out the rest of the PK:
如果它是用户数据,并且它会增长,那么我会找到一些东西来划分用户。根据商业案例,这是有意义的事情。例如,如果这是针对员工数据的,并且我知道所有结果集都是针对同一部门的员工的,那么部门可能是一个很好的分区键。然后我会添加用户ID和用户名,以完善其余的PK:
PRIMARY KEY (department,user_id,user_name);
This way, you could easily support whatever operators were required on user_id
. Or, if user_name
was more relevant, then user_id
and user_name
could be flipped in their order, with user_id
there to ensure row uniqueness (two people in the same dept with the same name, but different IDs).
这样,您就可以轻松地支持user_id上需要的任何运算符。或者,如果USER_NAME更相关,则可以按顺序颠倒USER_ID和USER_NAME,其中有USER_ID以确保行的唯一性(同一部门中的两个人具有相同的名称,但ID不同)。
only use one single node
On the other hand, if this dataset will not grow beyond the need for a single node, then my suggestion (and I don't say this often) would be to just use Postgres. You'll save yourself a lot of headaches later on.
另一方面,如果这个数据集不会增长到超过单个节点的需要,那么我的建议(我不经常这样说)将是只使用postgres。以后你会省去很多头痛的事。
更多回答
I think use single node nosql db is still valid case for some user cases. One user case is my business data model simply does not have such partition key logically.
我认为对于某些用例,使用单节点NoSQL数据库仍然是有效的。一个用户案例是,我的业务数据模型在逻辑上没有这样的分区键。
In the above case, what would you be case for high-availablity? :thinking:
在上述情况下,您认为高可用性的理由是什么?
我是一名优秀的程序员,十分优秀!