gpt4 book ai didi

PHP JSON_encode() 得到 "Malformed UTF-8 characters, possibly incorrectly encoded"(错误)

转载 作者:行者123 更新时间:2023-12-03 21:23:31 25 4
gpt4 key购买 nike

我无法解决这个问题,我快疯了。
JSON_encode()正在类型转换错误:Malformed UTF-8 characters, possibly incorrectly encoded在一组 10k 记录中的少数记录(2 或 3)上。
然而,这似乎很难解决。

  • mysql 到处都是 utf8mb4(数据库、表、列和排序规则)
  • php 是 7.2,当然是 utf8
  • apache 默认字符集是 utf8(但是错误是在 PHP 级别抛出的)。

  • 我还可以在简单的 HTML 调试页面中打印以正确筛选 PHP 中的记录,而不会出现问题。但是,如果我尝试在 JSON 中对其进行编码,则会出现错误。

    我发现这些记录是从 CVS 导入的,可能绕过了清洁器。奇怪的是,整个 CSV 文件被解析为:
    $this->encoding = mb_detect_encoding($source,mb_detect_order(),true);
    if ($this->encoding!="" && $this->encoding!="UTF8") {
    $source = iconv($this->encoding, "UTF-8", $source);
    }

    由于隐私(和 GDPR),我无法发布任何完整的损坏数据。
    但是我成功提取了一个似乎是损坏的部分:
    RESIDENCE �PRINCIPE

    更新

    我尝试获取这些损坏字符的位码。这是我发现的。
    在 ASCII 中使用简单的 native 函数 str_splitord这些字符是:
    '�' 160

    我也想在 utf8 中找到位码,所以我在 PHP.net http://php.net/manual/en/function.ord.php#109812 上找到了这个有用的函数
    试图找到 MultiByteStrings 的位码。它给了我:
    -2096

    哪个是……阴性?

    最佳答案

    解决了!

    问题出在函数 mb_detect_order() 中,这个功能不能像我预期的那样工作。我认为这是一个完整支持编码顺序的列表,主要用于加快检测过程。

    但我刚刚发现这个函数只返回 2 个编码:

    //print_r(mb_detect_order());
    Array
    (
    [0] => ASCII
    [1] => UTF-8
    )

    在我的情况下,这几乎完全没用。
    MB 函数可以检测更多字符集。
    您可以通过运行 mb_list_encodings() 来查看它们。并获取完整列表:
    //print_r(mb_list_encodings());
    Array
    (
    [0] => pass
    [1] => auto
    [2] => wchar
    [3] => byte2be
    [4] => byte2le
    [5] => byte4be
    [6] => byte4le
    [7] => BASE64
    [8] => UUENCODE
    [9] => HTML-ENTITIES
    [10] => Quoted-Printable
    [11] => 7bit
    [12] => 8bit
    [13] => UCS-4
    [14] => UCS-4BE
    [15] => UCS-4LE
    [16] => UCS-2
    [17] => UCS-2BE
    [18] => UCS-2LE
    [19] => UTF-32
    [20] => UTF-32BE
    [21] => UTF-32LE
    [22] => UTF-16
    [23] => UTF-16BE
    [24] => UTF-16LE
    [25] => UTF-8
    [26] => UTF-7
    [27] => UTF7-IMAP
    [28] => ASCII
    [29] => EUC-JP
    [30] => SJIS
    [31] => eucJP-win
    [32] => EUC-JP-2004
    [33] => SJIS-win
    [34] => SJIS-Mobile#DOCOMO
    [35] => SJIS-Mobile#KDDI
    [36] => SJIS-Mobile#SOFTBANK
    [37] => SJIS-mac
    [38] => SJIS-2004
    [39] => UTF-8-Mobile#DOCOMO
    [40] => UTF-8-Mobile#KDDI-A
    [41] => UTF-8-Mobile#KDDI-B
    [42] => UTF-8-Mobile#SOFTBANK
    [43] => CP932
    [44] => CP51932
    [45] => JIS
    [46] => ISO-2022-JP
    [47] => ISO-2022-JP-MS
    [48] => GB18030
    [49] => Windows-1252
    [50] => Windows-1254
    [51] => ISO-8859-1
    [52] => ISO-8859-2
    [53] => ISO-8859-3
    [54] => ISO-8859-4
    [55] => ISO-8859-5
    [56] => ISO-8859-6
    [57] => ISO-8859-7
    [58] => ISO-8859-8
    [59] => ISO-8859-9
    [60] => ISO-8859-10
    [61] => ISO-8859-13
    [62] => ISO-8859-14
    [63] => ISO-8859-15
    [64] => ISO-8859-16
    [65] => EUC-CN
    [66] => CP936
    [67] => HZ
    [68] => EUC-TW
    [69] => BIG-5
    [70] => CP950
    [71] => EUC-KR
    [72] => UHC
    [73] => ISO-2022-KR
    [74] => Windows-1251
    [75] => CP866
    [76] => KOI8-R
    [77] => KOI8-U
    [78] => ArmSCII-8
    [79] => CP850
    [80] => JIS-ms
    [81] => ISO-2022-JP-2004
    [82] => ISO-2022-JP-MOBILE#KDDI
    [83] => CP50220
    [84] => CP50220raw
    [85] => CP50221
    [86] => CP50222
    )

    我错了,以为 mb_detect_order只是这个列表的一个有序版本。 mb_detect_order只是……没用。为了以正确的方式在 UTF8 中编码,请使用以下代码:
    $my_encoding_list = [
    "UTF-8",
    "UTF-7",
    "UTF-16",
    "UTF-32",
    "ISO-8859-16",
    "ISO-8859-15",
    "ISO-8859-10",
    "ISO-8859-1",
    "Windows-1254",
    "Windows-1252",
    "Windows-1251",
    "ASCII",
    //add yours preferred
    ];

    //remove unsupported encodings
    $encoding_list = array_intersect($my_encoding_list, mb_list_encodings());

    //detect 'finally' the encoding
    $this->encoding = mb_detect_encoding($source,$encoding_list,true);

    这工作并解决了我在数据库中保存的错误数据的问题。

    关于PHP JSON_encode() 得到 "Malformed UTF-8 characters, possibly incorrectly encoded"(错误),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50610990/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com