gpt4 book ai didi

PHP:将 curl_exec 输出转换为 UTF8

转载 作者:可可西里 更新时间:2023-10-31 22:54:41 25 4
gpt4 key购买 nike

我只想使用 UTF8。问题是我不知道每个网页的字符集。我如何检测它并转换为 UTF8?

<?php
$url = "http://vkontakte.ru";
$ch = curl_init($url);
$options = array(
CURLOPT_RETURNTRANSFER => true,
);
curl_setopt_array($ch, $options);
$data = curl_exec($ch);

// $data = magic($data);

print $data;

参见:http://paulisageek.com/tmp/curl-utf8

什么是magic()

最佳答案

按照 Gumbo 和 Pekka 的建议,我编写了 curl_exec_utf8

/** The same as curl_exec except tries its best to convert the output to utf8 **/
function curl_exec_utf8($ch) {
$data = curl_exec($ch);
if (!is_string($data)) return $data;

unset($charset);
$content_type = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

/* 1: HTTP Content-Type: header */
preg_match( '@([\w/+]+)(;\s*charset=(\S+))?@i', $content_type, $matches );
if ( isset( $matches[3] ) )
$charset = $matches[3];

/* 2: <meta> element in the page */
if (!isset($charset)) {
preg_match( '@<meta\s+http-equiv="Content-Type"\s+content="([\w/]+)(;\s*charset=([^\s"]+))?@i', $data, $matches );
if ( isset( $matches[3] ) ) {
$charset = $matches[3];
/* In case we want do do further processing downstream: */
$data = preg_replace('@(<meta\s+http-equiv="Content-Type"\s+content="[\w/]+\s*;\s*charset=)([^\s"]+)@i', '$1utf-8', $data, 1);
}
}

/* 3: <xml> element in the page */
if (!isset($charset)) {
preg_match( '@<\?xml.+encoding="([^\s"]+)@si', $data, $matches );
if ( isset( $matches[1] ) ) {
$charset = $matches[1];
/* In case we want do do further processing downstream: */
$data = preg_replace('@(<\?xml.+encoding=")([^\s"]+)@si', '$1utf-8', $data, 1);
}
}

/* 4: PHP's heuristic detection */
if (!isset($charset)) {
$encoding = mb_detect_encoding($data);
if ($encoding)
$charset = $encoding;
}

/* 5: Default for HTML */
if (!isset($charset)) {
if (strstr($content_type, "text/html") === 0)
$charset = "ISO 8859-1";
}

/* Convert it if it is anything but UTF-8 */
/* You can change "UTF-8" to "UTF-8//IGNORE" to
ignore conversion errors and still output something reasonable */
if (isset($charset) && strtoupper($charset) != "UTF-8")
$data = iconv($charset, 'UTF-8', $data);

return $data;
}

正则表达式主要来自http://nadeausoftware.com/articles/2007/06/php_tip_how_get_web_page_content_type

关于PHP:将 curl_exec 输出转换为 UTF8,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2510868/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com