php mb_detect_encoding()-6ren

php mb_detect_encoding()

转载作者：可可西里更新时间：2023-10-31 23:50:02

26

4

首先我想说我已经阅读了另一篇关于 php 的 mb_detect_encoding 的帖子 Strange behaviour of mb_detect_order() in PHP .这肯定再次确认了我通过试错所学到的东西。然而，仍有一些事情让我感到困惑。

我正在构建一个主要是英文网站的 html 抓取工具，用于收集数据并将其存储为 UTF-8 XML。我遇到了一个页面 self 声明 ISO-8859-1 字符集的问题，但它包含 Windows-1252 独有的字符。特别是右单引号 (’) 0x92。据我了解，windows-1252 是 iso-8859-1 的超集，这促使我思考为什么要费心使用 utf8_encode()？为什么不只使用 iconv('Windows-1252', 'UTF-8', $str) 代替 utf8_encode() 因为 iso-8859-1 中表示的任何内容都会被转换以及 windows-1252 特有的字符(即。€‚ƒ' ' “”)

还有

$ansi = "€";//euro mark, the code file itself is in ansi

$detected = mb_detect_encoding($ansi, "WINDOWS-1252");// $detected == "Windows-1252"
$detected = mb_detect_encoding('a'.$ansi, "WINDOWS-1252");// $detected == FALSE
$detected = mb_detect_encoding($ansi.'a', "WINDOWS-1252");// $detected == "Windows-1252"
$detected = mb_detect_encoding($ansi.'a', "WINDOWS-1252",TRUE);// $detected == FALSE

为什么会这样？如果字符串中的第一个字符不是 windows-1252，即使它的其余部分是，它也会失败？这种行为不会让它变得毫无用处吗？至于区分 iso-8859-1 和 windows-1252

另一件让我感到困惑的事情是，假设我想检测 ASCII、ISO-8859-1、windows-1252、UTF-8 之间的字符集。是否有可能以给我最低排名集的方式检测字符串？ (即。

$ascii = "123"; // desired detect result == 'ASCII'
$iso = "é".$ascii; // desired detect result == 'ISO-8859-1'
$ansi = "€".$iso; // desired detect result == 'Windows-1252'
$utf8 = file_get_contents('utf8.txt', true);//$utf8 == '你好123é€', desired detect result == 'UTF-8'

不应该是我的 $detect_order = array('ASCII', 'ISO-8859-1', 'Windows-1252','UTF-8');我知道这是不正确的，因为它给了我以下结果

$ascii == 'ASCII'
$iso   == 'ISO-8859-1'
$ansi  == 'ISO-8859-1'
$utf8  == 'ISO-8859-1'

为什么我的 ('ASCII', 'ISO-8859-1', 'Windows-1252','UTF-8') 的检测顺序对于我想要得到的是错误的？

我得到的最接近的期望返回值是

$ascii == 'ASCII'
$iso   == 'ISO-8859-1'
$ansi  == 'ISO-8859-1'
$utf8  == 'UTF-8'

以下两个 mb_detect_order 数组都给了我上述值

$detect_order = array('ASCII', 'UTF-8', 'Windows-1252', 'ISO-8859-1');
$detect_order = array('ASCII', 'UTF-8', 'ISO-8859-1', 'Windows-1252');

这让我很困惑!

哇，有人可以解释一下吗？非常感谢!

最佳答案

这是一个known bug .

Windows-1251 和 Windows-1252 只有在整个字符串由一定范围内的高字节字符组成。这意味着你永远不会得到正确的转换，因为文本将显示为ISO-8859-1 即使是 Windows-1252。

我在从 LATIN1 转换为 UTF-8 时遇到了这个问题。我从 Microsoft Word 粘贴了许多内容，并使用 MySQL 表的 LATIN1 字符集存储在 VARCHAR 字段中。正如您可能知道的那样，Word 将撇号和引号转换为智能撇号和弯引号。它们都不会显示在屏幕上，因为这些字符没有正确转换。文本始终被标识为 ISO-8859-1。为了解决这个问题，我强制将 Windows-1252 转换为 UTF-8，并且撇号和引号(以及其他字符)都已正确转换。

关于php mb_detect_encoding()，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8168344/

26

4

0

文章推荐： php - Hudson 输出中缺少 PHP 项目的文件夹

文章推荐： javascript - Express/Node.js .. 我是否阻塞了事件循环？

文章推荐： php - anchor 在列内对齐

php mb_detect_encoding()
首先我想说我已经阅读了另一篇关于 php 的 mb_detect_encoding 的帖子 Strange behaviour of mb_detect_order() in PHP .这肯定再次确认
php - mb_detect_encoding() 没有按预期工作？
如果我不剪那么短的头发，我早就因为这个问题把它拔光了哈哈!非常感谢任何帮助，真的，我因此而发疯!! 所以我有一串数据来自(不是我的选择)mysql 数据库中的 latin1 表，看起来像这样: Hi
php - 如何使用 mb_detect_encoding 正确检测编码？
我想检测编码是否正确，但我发现 mb_detect_encoding 总是得到错误结果，并且我添加了很多 encoding_list UTF8 ISO-8859-* .... 最佳答案您正在尝试做一
PHP 函数 mb_detect_encoding 严格模式
在函数中mb_detect_encoding严格模式有一个参数。在第一个最赞的评论中:
PHP Heroku - 调用未定义函数 mb_detect_encoding()
我正在尝试将 PHP 项目部署到 Heroku。我使用 composer with 安装 slim framework 和 simple_http_dom 依赖项。这是我的 composer.json
php - mb_detect_encoding 与 Windows-1252 的意外结果
我读过维基百科关于 Windows-1252 的文章字符编码。对于字节值<128的字符，应该和ASCII/UTF-8一样。这是有道理的: php -r "var_export(mb_detect_e
mysql - php mb_detect_encoding 在 mysql 中等效
我有一个 php 进程，例如... if (mb_detect_encoding($msg) == 'ASCII') { .... } else { .... } 现在，我必须在 mysql 程序中
php - mb_detect_encoding 不能在 Windows-1250 (CP1250) 上正常工作
我在 mb_detect_encoding() 中检测 CP1250 时遇到问题，就我而言，我想检测 3 种编码: mb_detect_encoding($string, 'UTF-8,ISO-885
php - mb_detect_encoding 将 ASCII 检测为 UTF-8？
我正在尝试自动将导入的 IPTC 元数据从图像转换为 UTF-8，以便基于 PHP mb_ 函数存储在数据库中。目前它看起来像这样: $val = mb_convert_encoding($val,
php - mb_detect_encoding 将 ASCII 检测为 UTF-8？
我正在尝试自动将导入的 IPTC 元数据从图像转换为 UTF-8，以便基于 PHP mb_ 函数存储在数据库中。目前看起来是这样的: $val = mb_convert_encoding($val,
php - fatal error : Call to undefined function mb_detect_encoding()
我在尝试按照 this tutorial 设置 LAMP 后收到此错误，我发现自己在尝试设置 phpmyadmin 后收到上述错误。 Fatal error: Call to undefined fu
尝试访问 phpmyadmin mb_detect_encoding 时出现 PHP fatal error
不确定发生了什么，但下面是尝试访问 phpmyadmin 时日志给我的内容，请帮忙。尝试调试不同的问题并遇到了这个问题。真的不可能恢复到它工作时的状态。 PHP Fatal error: Call t
php - fatal error : Call to undefined function mb_detect_encoding()
我在尝试按照 this tutorial 设置 LAMP 后收到此错误，我发现自己在尝试设置 phpmyadmin 后收到上述错误。 Fatal error: Call to undefined fu
php - 理解 PHP 的 mb_detect_encoding 和 mb_check_encoding 函数的结果
我试图理解这两个函数的逻辑 mb_detect_encoding 和 mb_check_encoding ，但文档很差。从一个非常简单的测试字符串开始 $string = "\x65\x92"; 当使
php - 作为 web 服务响应的一部分传输 zip 数据/可逆 mb_detect_encoding
我有一个 PHP 网络服务，它当前返回一个 zip 存档作为其唯一输出。我正在使用 file_get_contents 从磁盘读取 zip 存档并将其作为响应的主体发回。我希望它以 JSON 格式返
php - 更改 PHP 版本后调用 Xampp 中未定义的函数 mb_detect_encoding() (PHPmyadmin)
我将默认 PHP 版本更改为旧版本 (5.3.28)，但是当我尝试跳转到 PHPMyadmin 时，出现错误 * fatal error :在第 177 行调用 C:\xampp2\phpMyAdmi
php - fatal error : Uncaught Error: Call to undefined function mb_detect_encoding()
我在尝试访问 http://localhost/phpmyadmin/ 时收到以下错误: Fatal error: Uncaught Error: Call to undefined function
php - Windows 10 上的 wamp，启动 phpmyadmin 会引发错误 : Fatal error: Call to undefined function mb_detect_encoding()
我是一位经验丰富的开发人员，但是:我是 Windows 10 新手，由于从未在工作中使用过这项技术，我正在学习一门涵盖 php、mysql、ajax 等的类(class)。我安装了 wamp，并在此
php - 第 177 行出现 fatal error : Call to undefined function mb_detect_encoding() in C:\apache\htdocs\phpmyadmin\libraries\php-gettext\gettext. inc
所以我正在尝试让 phpmyadmin 在 Windows 7 上运行，我已经完成并安装了所有正确的程序/文件(apache、mySQL、php 等)，我在右边有 phpmyadmin 文件夹放在我的
php - fatal error : Call to undefined function mb_detect_encoding() in C:\Users\Jarek\mywebsite\phpMyAdmin\libraries\php-gettext\gettext. inc 第 177 行
我正在尝试设置 phpMyAdmin，并且我正在关注此 site ，但我收到一个关于“ fatal error :在第 177 行调用 C:\Users\Jarek\mywebsite\phpMyAd

首页

博学

6Ren·AI

商城

php mb_detect_encoding()