python - SQLite、python、unicode 和非 utf 数据-6ren

python - SQLite、python、unicode 和非 utf 数据

转载作者：行者123 更新时间：2023-12-02 08:14:44

我首先尝试使用 python 在 sqlite 中存储字符串，并得到消息:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

好的，我切换到 Unicode 字符串。然后我开始收到消息:

sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós'

尝试从数据库检索数据时。更多的研究，我开始用 utf8 编码，但随后“Sigur Rós”开始看起来像“Sigur Rós”

注:正如@John Machin 指出的那样，我的控制台设置为显示在“latin_1”中。

是什么赋予了？阅读后 this ，描述了我所处的完全相同的情况，似乎建议是忽略其他建议并毕竟使用 8 位字节串。

在开始这个过程之前，我对 unicode 和 utf 了解不多。在过去的几个小时里，我学到了很多东西，但我仍然不知道是否有一种方法可以正确地将 'ó' 从 latin-1 转换为 utf-8 而不是破坏它。如果没有，为什么 sqlite 会“强烈推荐”我将我的应用程序切换到 unicode 字符串？

我将用我在过去 24 小时内学到的所有内容的摘要和一些示例代码来更新这个问题，以便像我这样的人可以有一个简单的(呃)指南。如果我发布的信息有任何错误或误导性，请告诉我，我会更新，或者你们中的一位资深人士可以更新。

答案汇总

让我首先陈述我所理解的目标。如果您尝试在它们之间进行转换，则处理各种编码的目标是了解您的源编码是什么，然后使用该源编码将其转换为 unicode，然后将其转换为您想要的编码。 Unicode 是一个基础，编码是该基础的子集的映射。 utf_8 为 unicode 中的每个字符提供了空间，但是因为它们与 latin_1 不在同一个位置，所以以 utf_8 编码并发送到 latin_1 控制台的字符串看起来不会像您期望的那样。在 python 中，进入 unicode 和进入另一种编码的过程如下所示:

str.decode('source_encoding').encode('desired_encoding')

或者如果 str 已经在 unicode 中

str.encode('desired_encoding')

对于 sqlite，我实际上并不想再次对其进行编码，我想对其进行解码并将其保留为 unicode 格式。当您尝试在 Python 中使用 unicode 和编码时，您可能需要注意以下四件事。

您想要使用的字符串的编码，以及您想要使用的编码。

系统编码。

控制台编码。

源文件的编码

细化:

(1) 当您从源中读取字符串时，它必须具有某种编码，例如 latin_1 或 utf_8。就我而言，我从文件名中获取字符串，所以不幸的是，我可能会得到任何类型的编码。 Windows XP 使用 UCS-2(一种 Unicode 系统)作为其 native 字符串类型，这对我来说似乎是作弊。对我来说幸运的是，大多数文件名中的字符不会由多个源编码类型组成，我认为我的所有字符要么完全是 latin_1，完全是 utf_8，要么只是普通的 ascii(这是两者的子集)那些)。所以我只是阅读它们并解码它们，就好像它们仍在 latin_1 或 utf_8 中一样。但是，您可能在 Windows 的文件名中混合了 latin_1 和 utf_8 以及任何其他字符。有时这些字符可以显示为框，有时它们看起来很困惑，有时它们看起来是正确的(重音字符等等)。继续。

(2) Python 有一个默认的系统编码，它在 python 启动时设置并且在运行时不能更改。见 here详情。脏总结......这是我添加的文件:

\# sitecustomize.py  
\# this file can be anywhere in your Python path,  
\# but it usually goes in ${pythondir}/lib/site-packages/  
import sys  
sys.setdefaultencoding('utf_8')

当您使用没有任何其他编码参数的 unicode("str") 函数时，将使用此系统编码。换句话说，python 尝试根据默认系统编码将“str”解码为 unicode。

(3)如果你使用的是IDLE或者命令行python，我认为你的控制台会按照默认的系统编码显示。由于某种原因，我在 eclipse 中使用 pydev，所以我必须进入我的项目设置，编辑我的测试脚本的启动配置属性，转到 Common 选项卡，并将控制台从 latin-1 更改为 utf-8，以便我可以直观地确认我正在做的事情是有效的。

(4) 如果你想要一些测试字符串，例如

test_str = "ó"

在你的源代码中，那么你必须告诉 python 你在那个文件中使用的编码类型。 (仅供引用:当我输入错误编码时，我不得不按 ctrl-Z，因为我的文件变得不可读。)这很容易通过在源代码文件的顶部放置这样的行来完成:

# -*- coding: utf_8 -*-

如果您没有此信息，python 会默认尝试将您的代码解析为 ascii，因此:

SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

一旦您的程序正常运行，或者，如果您没有使用 python 的控制台或任何其他控制台来查看输出，那么您可能真的只关心列表中的 #1。除非您需要查看输出和/或使用内置 unicode() 函数(没有任何编码参数)而不是 string.decode() 函数，否则系统默认值和控制台编码并不那么重要。我写了一个演示函数，我将粘贴到这个巨大困惑的底部，我希望它能够正确演示我列表中的项目。这是我通过演示函数运行字符 'ó' 时的一些输出，显示了各种方法如何对作为输入的字符使用react。对于这次运行，我的系统编码和控制台输出都设置为 utf_8:

'�' = original char <type 'str'> repr(char)='\xf3'
'?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data
'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3'
'?' = char.decode('utf_8')  ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data

现在我将系统和控制台编码更改为 latin_1，并且我得到相同输入的输出:

'ó' = original char <type 'str'> repr(char)='\xf3'
'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3'
'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3'
'?' = char.decode('utf_8')  ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data

请注意，“原始”字符显示正确，并且内置的 unicode() 函数现在可以工作了。

现在我将控制台输出改回 utf_8。

'�' = original char <type 'str'> repr(char)='\xf3'
'�' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3'
'�' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3'
'?' = char.decode('utf_8')  ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data

此处一切仍与上次相同，但控制台无法正确显示输出。等等。下面的函数还显示了更多的信息，希望能帮助人们找出他们理解上的差距。我知道所有这些信息都在其他地方并且在那里得到了更彻底的处理，但我希望这对于尝试使用 python 和/或 sqlite 进行编码的人来说是一个很好的起点。想法很棒，但有时源代码可以为您节省一两天时间来尝试弄清楚什么函数做什么。

免责声明:我不是编码专家，我把这些放在一起是为了帮助我自己的理解。当我应该开始将函数作为参数传递以避免过多的冗余代码时，我一直在此基础上进行构建，因此如果可以的话，我会使其更加简洁。此外，utf_8 和 latin_1 绝不是唯一的编码方案，它们只是我在玩的两个，因为我认为它们可以处理我需要的一切。将您自己的编码方案添加到演示功能并测试您自己的输入。

还有一件事:还有 apparently crazy application developers使 Windows 中的生活变得困难。

#!/usr/bin/env python
# -*- coding: utf_8 -*-

import os
import sys

def encodingDemo(str):
    validStrings = ()
    try:        
        print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str))
        validStrings += ((str,""),)
    except UnicodeEncodeError as ude:
        print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string.  See error:\n\t",
        print ude
    try:
        x = unicode(str)
        print "unicode(str) = ",x
        validStrings+= ((x, " decoded into unicode by the default system encoding"),)
    except UnicodeDecodeError as ude:
        print "ERROR.  unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string."
        print "\tThe system encoding is set to {0}.  See error:\n\t".format(sys.getdefaultencoding()),  
        print ude
    except UnicodeEncodeError as uee:
        print "ERROR.  Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string.  See error:\n\t",
        print uee
    try:
        x = str.decode('latin_1')
        print "str.decode('latin_1') =",x
        validStrings+= ((x, " decoded with latin_1 into unicode"),)
        try:        
            print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8')
            validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),)
        except UnicodeDecodeError as ude:
            print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8.  See error:\n\t",
            print ude
    except UnicodeDecodeError as ude:
        print "Something didn't work, probably because the string wasn't latin_1 encoded.  See error:\n\t",
        print ude
    except UnicodeEncodeError as uee:
        print "ERROR.  Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string.  See error:\n\t",
        print uee
    try:
        x = str.decode('utf_8')
        print "str.decode('utf_8') =",x
        validStrings+= ((x, " decoded with utf_8 into unicode"),)
        try:        
            print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1')
        except UnicodeDecodeError as ude:
            print "str.decode('utf_8').encode('latin_1') didn't work.  The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1.  See error:\n\t",
            validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),)
            print ude
    except UnicodeDecodeError as ude:
        print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded.  See error:\n\t",
        print ude
    except UnicodeEncodeError as uee:
        print "ERROR.  Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string.  See error:\n\t",uee

    print
    print "Printing information about each character in the original string."
    for char in str:
        try:
            print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char))
        except UnicodeDecodeError as ude:
            print "\t'?' = original char  {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude)
        except UnicodeEncodeError as uee:
            print "\t'?' = original char  {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee)
            print uee    

        try:
            x = unicode(char)        
            print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x))
        except UnicodeDecodeError as ude:
            print "\t'?' = unicode(char) ERROR: {0}".format(ude)
        except UnicodeEncodeError as uee:
            print "\t'?' = unicode(char)  {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee)

        try:
            x = char.decode('latin_1')
            print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x))
        except UnicodeDecodeError as ude:
            print "\t'?' = char.decode('latin_1')  ERROR: {0}".format(ude)
        except UnicodeEncodeError as uee:
            print "\t'?' = char.decode('latin_1')  {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee)

        try:
            x = char.decode('utf_8')
            print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x))
        except UnicodeDecodeError as ude:
            print "\t'?' = char.decode('utf_8')  ERROR: {0}".format(ude)
        except UnicodeEncodeError as uee:
            print "\t'?' = char.decode('utf_8')  {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee)

        print

x = 'ó'
encodingDemo(x)

非常感谢下面的答案，尤其是@John Machin 如此彻底的回答。

最佳答案

I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it

repr() 和 unicodedata.name() 在调试此类问题时是您的 friend :

>>> oacute_latin1 = "\xF3"
>>> oacute_unicode = oacute_latin1.decode('latin1')
>>> oacute_utf8 = oacute_unicode.encode('utf8')
>>> print repr(oacute_latin1)
'\xf3'
>>> print repr(oacute_unicode)
u'\xf3'
>>> import unicodedata
>>> unicodedata.name(oacute_unicode)
'LATIN SMALL LETTER O WITH ACUTE'
>>> print repr(oacute_utf8)
'\xc3\xb3'
>>>

如果您将 oacute_utf8 发送到为 latin1 设置的终端，您将得到 A-tilde，后跟上标-3。

I switched to Unicode strings.

你叫什么 Unicode 字符串？ UTF-16？

What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all.

我无法想象在你看来是怎样的。所传达的故事是 Python 中的 unicode 对象和数据库中的 UTF-8 编码是要走的路。然而，Martin 回答了最初的问题，为 OP 提供了一种方法(“文本工厂”)，以便能够使用 latin1——这并不构成推荐!

更新针对评论中提出的这些进一步问题:

I didn't understand that the unicode characters still contained an implicit encoding. Am I saying that right?

不。编码是 Unicode 和其他东西之间的映射，反之亦然。 Unicode 字符没有编码，无论是隐式的还是其他的。

It looks to me like unicode("\xF3") and "\xF3".decode('latin1') are the same when evaluated with repr().

说什么？在我看来并不像:

>>> unicode("\xF3")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 0: ordinal
not in range(128)
>>> "\xF3".decode('latin1')
u'\xf3'
>>>

也许你的意思是: u'\xf3' == '\xF3'.decode('latin1') ……这当然是真的。
unicode(str_object, encoding)也是如此与 str_object.decode(encoding) 相同...包括在提供不适当的编码时爆炸。

Is that a happy circumstance

Unicode 中的前 256 个字符是相同的，code for code，因为 latin1 中的 256 个字符是一个好主意。因为所有 256 个可能的 latin1 字符都映射到 Unicode，这意味着任何 8 位字节、任何 Python str 对象都可以解码为 unicode，而不会引发异常。这是应该的。

然而，有些人混淆了两个完全不同的概念:“我的脚本运行完成，没有引发任何异常”和“我的脚本没有错误”。对他们来说，latin1 是“一个圈套和一个错觉”。

换句话说，如果您有一个实际编码为 cp1252 或 gbk 或 koi8-u 或其他格式的文件，并且您使用 latin1 对其进行解码，则生成的 Unicode 将完全是垃圾并且 Python(或任何其他语言)不会标记错误 - - 它无法知道你犯了一个傻事。

or is unicode("str") going to always return the correct decoding?

就像那样，默认编码是 ascii，如果文件实际上是用 ASCII 编码的，它将返回正确的 unicode。否则，它会爆炸。

同样，如果您指定正确的编码，或者是正确编码的超集，您将获得正确的结果。否则你会得到胡言乱语或异常。

简而言之:答案是否定的。

If not, when I receive a python str that has any possible character set in it, how do I know how to decode it?

如果 str 对象是有效的 XML 文档，则会预先指定它。默认为 UTF-8。
如果它是一个正确构造的网页，则应预先指定(查找“字符集”)。不幸的是，许多网页作者都说谎了(ISO-8859-1 aka latin1，应该是 Windows-1252 aka cp1252；不要浪费资源尝试解码 gb2312，而是使用 gbk)。您可以从网站的国籍/语言中获得线索。

UTF-8 总是值得一试。如果数据是 ascii，它会正常工作，因为 ascii 是 utf8 的子集。使用非 ascii 字符编写并以 utf8 以外的编码编码的文本字符串，如果您尝试将其解码为 utf8，则几乎肯定会失败并出现异常。

以上所有的启发式以及更多的统计信息都封装在 chardet 中。，用于猜测任意文件编码的模块。它通常运作良好。但是，您无法使软件防白痴。例如，如果您将一些用编码 A 和一些用编码 B 编写的数据文件连接起来，并将结果提供给 chardet，答案很可能是编码 C 的置信度降低，例如0.8.始终检查答案的置信度部分。

如果一切都失败了:

(1) 试着在这里提问，从你的数据前面取一个小样本...... print repr(your_data[:400]) ...以及您拥有的有关其出处的任何抵押信息。

(2) 俄罗斯近期研究 techniques for recovering forgotten passwords似乎非常适用于推断未知编码。

更新 2 顺便说一句，是不是该你提出另一个问题的时候了？-)

One more thing: there are apparently characters that Windows uses as Unicode for certain characters that aren't the correct Unicode for that character, so you may have to map those characters to the correct ones if you want to use them in other programs that are expecting those characters in the right spot.

不是 Windows 做的。这是一群疯狂的应用程序开发人员。您可能没有改写而是引用了您所引用的 effbot 文章的开头段落，这可以更容易理解:

Some applications add CP1252 (Windows, Western Europe) characters to documents marked up as ISO 8859-1 (Latin 1) or other encodings. These characters are not valid ISO-8859-1 characters, and may cause all sorts of problems in processing and display applications.

背景:

U+0000 到 U+001F 的范围在 Unicode 中被指定为“C0 控制字符”。这些也存在于 ASCII 和 latin1 中，含义相同。它们包括诸如回车、换行、响铃、退格、制表符等熟悉的东西，以及其他很少使用的东西。

U+0080 到 U+009F 的范围在 Unicode 中被指定为“C1 控制字符”。这些也存在于 latin1 中，包括 32 个字符，unicode.org 之外的任何人都无法想象任何可能的用途。

因此，如果您对 unicode 或 latin1 数据运行字符频率计数，并且发现该范围内的任何字符，则您的数据已损坏。没有通用的解决方案；这取决于它是如何损坏的。这些字符可能与相同位置的 cp1252 字符具有相同的含义，因此 effbot 的解决方案将起作用。在我最近一直在研究的另一个案例中，狡猾的字符似乎是由连接以 UTF-8 编码的文本文件和另一种需要根据(人类)语言中的字母频率推断的编码引起的。写在。

关于python - SQLite、python、unicode 和非 utf 数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2392732/

文章推荐： spring - Hazelcast map 存储类的 Bean 注入(inject)失败

文章推荐： python - Seaborn:如何用条形图中X轴上的文本替换索引？

unicode - UTF-8、UTF-16 和 UTF-32
UTF-8、UTF-16 和 UTF-32 之间有何区别？据我所知，它们都将存储 Unicode，并且每个都使用不同数量的字节来表示字符。选择其中之一是否有优势？最佳答案当 ASCII 字符代表
unicode - UTF-8、UTF-16 和 UTF-32 可以存储的字符数是否不同？
好的。我知道这看起来像典型的“他为什么不直接用谷歌搜索或去 www.unicode.org 查一下？”问题，但对于这样一个简单的问题，在检查了两个来源后，我仍然无法回答。我很确定这三种编码系统都支持
utf-8 - 是否存在可以用 UTF-16 编码但不能用 UTF-8 编码的字符？
是否存在可以用 UTF-16 编码但不能用 UTF-8 编码的字符最佳答案没有。 UTF-* 是可以对全范围 Unicode 字符进行编码的编码。编码之间的差异在于每个字符使用多少字节。关于u
utf-8 - 是否存在可以用 UTF-16 编码但不能用 UTF-8 编码的字符？
是否存在可以用 UTF-16 编码但不能用 UTF-8 编码的字符最佳答案没有。 UTF-* 是可以对全范围 Unicode 字符进行编码的编码。编码之间的差异在于每个字符使用多少字节。关于u
c - 在UTF-16、UTF-16BE、UTF-16LE中，UTF-16的字节序是计算机的字节顺序吗？
UTF-16 是一种双字节字符编码。交换两个字节的地址将产生 UTF-16BE 和 UTF-16LE。但我发现在 Ubuntu gedit 文本编辑器中存在名称 UTF-16 编码，以及 UTF-1
utf-8 - 使用 ICU 库的 UTF-16 到 UTF-8
我想将 UTF-16 字符串转换为 UTF-8。我通过 Unicode 发现了 ICU 库。我在转换时遇到问题，因为默认设置是 UTF-16。我试过使用转换器: UErrorCode myError
utf-8 - 为什么 USB 对字符串使用 UTF-16(为什么不使用 UTF-8)
UTF-16 需要 2 个字节，UTF-8 需要 1 个字节。而USB是面向8bit的，UTF-8更自然。 UTF-8 向后兼容 ASCII，而 UTF-16 则不然。 UTF-16 需要 2 个字
javascript - UTF-8 与 UTF-16 和 UTF-32 转换混淆
我对将 unicode 字符转换为十六进制值有点困惑。我正在使用这个网站获取字符的十六进制值。 ( https://www.branah.com/unicode-converter ) 如果我输入“
utf-8 - UTF-8编码的文件大小？
我已经用UTF-8编码创建了一个文件，但是我不了解其在磁盘上占用的大小的规则。这是我的完整研究: 首先，我创建了一个带有印地语字母“'”的文件，Windows 7上的文件大小为 8个字节。现在带有两
utf-8 - UTF-8中的字符串到字节数组？
如何将WideString(或其他长字符串)转换为UTF-8中的字节数组？最佳答案这样的功能将满足您的需求: function UTF8Bytes(const s: UTF8String): TB
utf-8 - UTF-8中的代理字符是什么？
我有一个奇怪的验证程序，用于验证utf-8字符串是否是有效的主机名(PHP中的Zend Framework主机名valdiator)。它允许IDN(国际化域名)。它将比较每个子域与由其十六进制字节表示
unicode - utf-8 null 和 utf-16/utf-32 null 一样吗？
在 utf16 和 utf32 中，一个字节的零是否意味着空？就像在 utf8 中一样，还是我们需要 2 个和 4 个字节的零来相应地在 utf16 和 utf32 中创建 null？最佳答案在
mysql - "AddDefaultCharset utf-8"指定的 Apache utf-8 字符集是否是完整的 utf-8？
这是基于我的观察，对于 mysql，默认字符集 utf8 有点误导，它不支持完整的 Unicode，因为它无法存储四字节 UTF-8 编码的字符。它实际上是 utf8mb4 字符集，它是完整的 Uni
c++ - 在 C++ 内部使用 UTF-8、UTF-16 和 UTF-32？
我只有处理 ASCII(单字节字符)的经验，并且阅读了很多关于人们如何以不同方式处理 Unicode 的帖子，这些帖子提出了他们自己的一系列问题。此时我对 Unicode 的了解非常有限，我读到过U
c++ - C++ 是否支持 UTF-8、UTF-16 和 UTF-32 以外的字符编码之间的转换？
我明白 std::codecvt在 C++11 中执行 UTF-16 和 UTF-8 之间的转换，并且 std::codecvt执行 UTF-32 和 UTF-8 之间的转换。是否可以在 UTF-8
utf-8 - Babel 有类似 trivial-utf-8 :write-utf-8-bytes? 的功能吗
我正在编写一个 HTTP 服务器并使用 trivial-utf-8:write-utf-8-bytes 来响应请求。我听说Babel就像trivial-utf-8但效率更高，所以我想试一试。搜索了一段
c# - UTF-8 或 UTF-16 或 UTF-32 或 UCS-2
我正在设计一个新的 CMS，但想要设计它来满足我 future 的所有需求，比如多语言内容，所以我认为 Unicode (UTF-8) 是最好的解决方案但是通过一些搜索我得到了这篇文章 http:/
.net - 如何将 UTF-8 编码为 UTF-16 的 xml 字符串转换为 UTF-16？
例如，假设我在字符串中有以下 xml: 如果我尝试将其插入到带有 Xml 列的 SQL Server 2005 数据库表中，我将收到以下错误(我使用的是 EF 4.1，但我认为这无关紧要): XM
python - "utf-8-sig"是否适契约(Contract)时解码 UTF-8 和 UTF-8 BOM？
我正在使用 Python CSV 库读取两个 CSV 文件。一种使用 UTF-8-BOM 编码，另一种使用 UTF-8 编码。在我的实践中，我发现使用“utf-8-sig”作为编码类型可以读取这两个
php - mysql_real_escape_string 是否容易受到无效的 UTF-8 攻击，例如超长的 UTF-8 或格式错误的 UTF-8 序列？
假设我的数据库设置如下以使用 utf-8(mysql 中的完整 4mb 版本) mysql_query("SET CHARACTER SET utf8mb4"); mysql_query("SET N

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - SQLite、python、unicode 和非 utf 数据