gpt4 book ai didi

string - Cassandra:黑白文本(VARCHAR)和ASCII的区别

转载 作者:行者123 更新时间:2023-12-04 11:00:52 24 4
gpt4 key购买 nike

我知道 text 和 varchar 是别名,用于存储 UTF-8 字符串。
ASCII,在文档中说“US-ASCII 字符串”呢?除了编码之外还有什么区别?

有大小区别吗?当我存储大字符串 (~500KB) 时,这两者是首选吗?

最佳答案

关于 this anwer :

If the data is a piece of text, for example a String in Java, which is encoded in UTF-16 in the runtime, but when serialized in Cassandra with text type then UTF-8 is used. UTF-16 always use 2 bytes per character and sometime 4 bytes, but UTF-8 is space efficient and depending on the character can be 1, 2, 3 or 4 bytes long.

That mean that there's CPU work to serialize such data for encoding/decoding purpose. Also depending on the text for example 158786464563, data will be stored with 12 bytes. That means more space is used and more IO as well.

Note cassandra offers the ascii type that follows the US-ASCII character set and is always using 1 byte per character.



Is there any size difference?



Is the a preferred choice between these two when I'm storing large strings (~500KB)?



因为 ascii 比 UTF-8 更节省空间,而 UTF-8 比 UTF-16 更节省空间。同样,所有事情都取决于您如何序列化/编码/解码这些数据。如需更多信息,请查看“ what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8

关于string - Cassandra:黑白文本(VARCHAR)和ASCII的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45017699/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com