gpt4 book ai didi

Mysql 四字节汉字支持

转载 作者:行者123 更新时间:2023-12-02 19:19:47 27 4
gpt4 key购买 nike

我无法执行此 SQL 脚本:

INSERT INTO `mabase`.`new_table` (`idnew_table`, `name`) VALUES ('2', '𠼭');

错误是:

ERROR 1366: Incorrect string value: '\xF0\xA0\xBC\xAD' for column 'name' at row 1 SQL Statement: INSERT INTO mabase.new_table (idnew_table, name) VALUES ('2', '𠼭')

我的数据库和表采用 utf8 字符集和 utf8_general_ci 排序规则。我也尝试过:utf8_unicode_ci,utf8mb4_general_ci,bg5_cinese_ci,gbk_cinese_ci。

我已经在 MySql 工作台中尝试了所有这些在 Windows 上。

𠼭 是四字节字符。我只对他们有问题。请告诉我如何在 mysql 中保存四个字节字符。

最佳答案

您想要的角色,U+20F2D ,驻留在 Unicode 的“补充表意文字平面”的“CJK 统一表意文字扩展 B” block 中,因此在 v5.5 之前的任何 MySQL Unicode 字符集中不可用;自 v5.5 起,它可在 utf8mb4 中找到。 , utf16 , utf16leutf32字符集。

它在 MySQL 的 big5gbk 字符集中不可用。

<小时/>

为什么 utf8 编码不起作用

Unicode Support 下所述:

The initial implementation of Unicode support (in MySQL 4.1) included two character sets for storing Unicode data:

  • ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character.

  • utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character.

These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:

  • Their code values are between 0 and 65535 (or U+0000 .. U+FFFF).

  • They can be encoded with a fixed 16-bit word, as in ucs2.

  • They can be encoded with 8, 16, or 24 bits, as in utf8.

  • They are sufficient for almost all characters in major languages.

Characters not supported by the aforementioned character sets include supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.

In MySQL 5.6, Unicode support includes supplementary characters, which requires new character sets that have a broader range and therefore take more space. The following table shows a brief feature comparison of previous and current Unicode support.

╔══════════════════════════════╦══════════════════════════════════════════════╗║       Before MySQL 5.5MySQL 5.5 and up                ║╠══════════════════════════════╬══════════════════════════════════════════════╣║ All Unicode 3.0 characters   ║ All Unicode 5.0 and 6.0 characters           ║╠══════════════════════════════╬══════════════════════════════════════════════╣║ No supplementary characters  ║ With supplementary characters                ║╠══════════════════════════════╬══════════════════════════════════════════════╣║ ucs2 character set, BMP only ║ No change                                    ║╠══════════════════════════════╬══════════════════════════════════════════════╣║ utf8 character set for up to ║ No change                                    ║║ three bytes, BMP only        ║                                              ║╠══════════════════════════════╬══════════════════════════════════════════════╣║                              ║ New utf8mb4 character set for up to four     ║║                              ║ bytes, BMP or supplemental                   ║╠══════════════════════════════╬══════════════════════════════════════════════╣║                              ║ New utf16 character set, BMP or supplemental ║╠══════════════════════════════╬══════════════════════════════════════════════╣║                              ║ New utf16le character set, BMP or            ║║                              ║ supplemental (5.6.1 and up)                  ║╠══════════════════════════════╬══════════════════════════════════════════════╣║                              ║ New utf32 character set, BMP or supplemental ║╚══════════════════════════════╩══════════════════════════════════════════════╝

These changes are upward compatible. If you want to use the new character sets, there are potential incompatibility issues for your applications; see Section 10.1.11, “Upgrading from Previous to Current Unicode Support”. That section also describes how to convert tables from utf8 to the (4-byte) utf8mb4 character set, and what constraints may apply in doing so.

为什么big5编码不起作用

What problems should I be aware of when working with the Big5 Chinese character set? 下所述:

MySQL supports the Big5 character set which is common in Hong Kong and Taiwan (Republic of China). MySQL's big5 is in reality Microsoft code page 950, which is very similar to the original big5 character set.

[ deletia ]

A feature request for adding HKSCS extensions has been filed. People who need this extension may find the suggested patch for Bug #13577 to be of interest.

为什么gbk编码不起作用

What CJK character sets are available in MySQL? 下所述:

Here, we try to clarify exactly what characters are legitimate in gb2312 or gbk, with reference to the official documents. Please check these references before reporting gb2312 or gbk bugs.

  • For a complete listing of the gb2312 characters, ordered according to the gb2312_chinese_ci collation: gb2312

  • MySQL's gbk is in reality “Microsoft code page 936”. This differs from the official gbk for characters A1A4 (middle dot), A1AA (em dash), A6E0-A6F5, and A8BB-A8C0.

  • For a listing of gbk/Unicode mappings, see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT.

  • For MySQL's listing of gbk characters, see gbk.

关于Mysql 四字节汉字支持,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17680237/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com