html - 如何通过HTTP headers知道HTML内容的字符集？-6ren

html - 如何通过HTTP headers知道HTML内容的字符集？

转载作者：行者123 更新时间：2023-12-02 03:54:54

27

4

我知道HTTP header:Content-Type中的参数charset=可以用来确定HTML内容的字符集。但如果 Content-Type header 中缺少该参数，我如何知道 HTML 内容的字符集？

我也知道有这样的标签

"meta charset="utf-8""

在HTML中用于指定字符集。但是我们只有在解析 HTML 后才能得到该标签，并且解析 HTML 需要首先知道字符集。

最佳答案

在没有明确的情况下 charset Content-Type 中的属性 header ，通过不同传输方式发送的不同媒体类型具有不同的默认字符集。

例如，仅显示一些定义:

RFC 2046 ，部分4.1.2 MIME 规范的内容如下:

Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.

RFC 2616 ，部分3.7.1 HTTP 协议(protocol)规范的内容如下:

The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems.

后来被 RFC 7231 逆转, Appendix B :

The default charset of ISO-8859-1 for text media types has been removed; the default is now whatever the media type definition says. Likewise, special treatment of ISO-8859-1 has been removed from the Accept-Charset header field. (Section 3.1.1.3 and Section 5.3.3).

RFC 3023 ，部分3.1 , 3.3 , 3.6 ，和 8.5 XML 媒体类型规范说:

Conformant with [RFC2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. In cases where the XML MIME entity is transmitted via HTTP, the default charset value is still "us-ascii". (Note: There is an inconsistency between this specification and HTTP/1.1, which uses ISO-8859-1[ISO8859] as the default for a historical reason. Since XML is a new format, a new default should be chosen for better I18N. US-ASCII was chosen, since it is the intersection of UTF-8 and ISO-8859-1 and since it is already used by MIME.)

The charset parameter of text/xml-external-parsed-entity is handled the same as that of text/xml as described in Section 3.1.

The following list applies to text/xml, text/xml-external-parsed-entity, and XML-based media types under the top-level type "text" that define the charset parameter according to this specification:

...

If the charset parameter is not specified, the default is "us-ascii". The default of "iso-8859-1" in HTTP is explicitly overridden.

This example shows text/xml with the charset parameter omitted. In this case, MIME and XML processors MUST assume the charset is "us-ascii", the default charset value for text media types specified in [RFC2046]. The default of "us-ascii" holds even if the text/xml entity is transported using HTTP.

Omitting the charset parameter is NOT RECOMMENDED for text/xml. For example, even if the contents of the XML MIME entity are UTF-16 or UTF-8, or the XML MIME entity has an explicit encoding declaration, XML and MIME processors MUST assume the charset is "us-ascii".

RFC 7159 ，部分8.1和 11 ，JSON 规范说:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients.

因此，一般来说，如果您想知道给定资源使用的字符集，并且该字符集不是通过外部方式表达的，例如 charset Content-Type 的属性 header ，那么您必须确定您正在处理的数据类型，然后根据该数据类型的规范概述确定其字符集。

就您而言，您正在通过 HTTP 处理 HTML，因此 RFC 2616 规则适用于您。 HTML 5 spec ，部分8.2.2.2定义了一个非常详细的算法，用于在没有 charset 时确定 HTML 的字符集属性在 Content-Type 中指定 header 。该算法首先检查 UTF BOM 是否存在。，如果不存在则假设 HTML 是 8 位并解析它以查找任何 <meta>包含字符集或语言声明的标签。

XML 1.0 specification , Appendix F ，还定义了一种算法，可以轻松确定 XML prolog 使用的字符集，因此您可以阅读其 Encoding属性(如果存在)以确定剩余 XML 的字符集。

关于html - 如何通过HTTP headers知道HTML内容的字符集？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44344533/

27

4

0

文章推荐： api - 如何使用 API 设置 Google 日历事件的颜色

文章推荐： r - 在R中: how to sum a variable by group between two dates

文章推荐： php - 如果使用 dd 方法，Laravel 5.4 session 不会被设置

php - 知道 youtube 使用的是什么视频编码技术吗？
关闭。这个问题是off-topic .它目前不接受答案。想改进这个问题？ Update the question所以它是on-topic对于堆栈溢出。 9年前关闭。 Improve this que
php - 知道 php 脚本是否仍在运行
我有一堆 php 脚本计划在 CentOS 机器上的 cron 中每隔几分钟运行一次。我希望每个脚本在启动时自我检查它的前一个实例是否仍在运行，如果是则停止。最佳答案我这样做是为了管理任务并确保它
endpoint - 知道 USB 设备的端点
是否有 bash 命令、程序或 libusb 函数(尽管我没有找到)来指示 USB 设备的 OUT 或 IN 端点是什么？例如，libusb_interface_descriptor(来自 libu
cocoa - 知道 NSTextField 何时成为第一响应者
我如何知道 NSTextField 何时成为第一响应者(即当用户单击它来激活它时，但在他们开始输入之前)。我尝试了 controlTextDidBeginEditing 但直到用户键入第一个字符后才会
javascript - 知道 forEach 循环何时结束
我怎么知道我的代码何时完成循环？完成后我还得再运行一些代码，但只有当我在那里写的所有东西都完成后它才能运行。 obj.data.forEach(function(collection) {
javascript - 知道 “audio”标签html何时被播放
我正在使用音频标签，我希望它能计算播放了多少次。我的代码是这样的: ; ; ; 然后在一个javascript文件中 Var n=0; function doing(onplaying)
eclipse - 我怎样才能得到(知道)eclipse中特定菜单的menuid？
我正在尝试向 Package-Explorer 的项目上下文菜单添加一个子菜单。但是，我找不到该菜单的 menuid。所以我的问题是如何在 eclipse 中找到 menuid？非常感谢您的帮助。
javascript - 知道 JavaScript 中表单的名称
我有一个名为“下一步”的按钮，它存在于几个 asp.net 页面中。实际上它是在用户控件中。单击“下一步”时，它会调用 JavaScript 中的函数 CheckServicesAndStates。我
c++ - 知道 CPU 是否支持纳秒
我正在尝试在 Visual Studio 中使用 C++ 以纳秒为单位计算耗时。我做了一些测试，结果总是以 00 结尾。这是否意味着我的处理器(Ryzen 7-1800X)不支持 ~1 纳秒的分辨率，
java - 知道 ListView 中单击的复选框项吗？
我有一个自定义 ListView ，其中包含一些元素和一个复选框。当我点击一个按钮时。我想知道已检查的元素的位置。下面是我的代码 public class Results extends ListAc
java-me - 知道 J2ME 中的网络运营商名称
如何在使用 J2ME 编写的应用程序中获取网络运营商名称？我最近正在尝试在 Nokia s40 上开发一个应用程序，它应该具有对特定网络运营商的独占访问权限。有没有这样的API或库？最佳答案没有
delphi - 知道 Onclick 事件被触发
我使用服务器客户端组件，当在此组件的 TransferFile 事件中接收文件时，我使用警报消息组件。所以我希望，如果用户单击警报消息，程序将继续执行 TransferFile 事件中的代码，以在单击
java - 有没有办法获取(知道)从同一个类中的类创建的所有对象？
如果我创建一个类A具有一些属性，例如 a, b, c我创建对象 A x1; A x2; A x3; ... A xN 。有没有办法在同一个类中创建一个方法来检索我创建的所有对象？我想创建类似 stat
java - 知道 Android 中点击了哪个按钮
我正在制作一个应用程序，其中包含相同布局的 81 个按钮。它们都被称为我创建的名为“Tile”的对象。问题是这些图 block 存储在数组中，因此我需要知道以 int 格式单击了哪个按钮才能调用图 b
ios - 知道 UIProgressView 何时停止动画
UIProgressView有这个setProgress:animated: API。有没有办法确切知道动画何时停止？我的意思是这样的？ [myProgress setProgress:0.8f
jquery - 知道 jquery 队列何时完成
我正在使用两个 jQuery 队列，我希望其中一个队列在另一个队列完成后出队。我怎么知道第一个是否完成？我应该使用第三个队列吗？! 这是我所拥有的: var $q = $({}); $q.que
jquery - 知道 Jquery 中是否选中了一个或多个复选框
jQuery 中有没有一种方法可以知道是否至少有一个复选框已被选中？我有一个包含很多复选框的表单，每个复选框都不同。我需要一种 jQuery 的方式来表达这样的内容，这就是逻辑: If at le
javascript - 知道 HTML 标签是否有文本节点
给定 2 个选择 100 50 100 在这两种情况下，我都想在 .example 中获取数字，使用相同的选择器或者以某种方式知道 .no-text 和之间的区别。带文字执行
c# - 知道 DataBinding 何时完成
我在我的应用程序中使用 System.ComponentModel.BindingList 作为 DataGridView.DataSource。该列表非常大，需要几秒钟才能绘制到 DataGridV
java - 知道 android 上的默认键盘
我想知道用户在 Android 中选择的默认键盘。我知道我可以使用 InputMethodManager 访问已启用的输入法列表，但我想知道用户当前使用的是哪一个。到目前为止，我已经尝试获取当前的输

首页

博学

6Ren·AI

商城

html - 如何通过HTTP headers知道HTML内容的字符集？