gpt4 book ai didi

mysql用UTF等价物替换html特殊字符

转载 作者:行者123 更新时间:2023-11-29 03:50:08 25 4
gpt4 key购买 nike

我有一个数据库,其中一些元素由 HTML 特殊字符组成:

| Universidad Tecnológica Nacional - UTN                                                  |
| Instituto Tecnológico de Buenos Aires |
| Instituto Superior del Profesorado "Dr. Joaquín V. González" |
| Escuela Nacional de Náutica "Manuel Belgrano" |
| Conservatorio Nacional de Música "Carlos López Buchardo" |
| Instituto Argentino de Computacion - IAC |
| Conservatorio de Superior de Música "Manuel de Falla" |

我需要将其转换为正确的 UTF 格式。 除了遍历数据库并将每个代码映射到等效符号之外,我还能做得更好吗?

á -> 'á'
" -> '"'
...

最佳答案

my comment above 中所述,非常不清楚您在自己的情况下要做什么。

Can I do better than just iterating through the database, and having a mapping from each code to the equivalent symbol?

嗯,是的。您可以用替换字符替换字符代码实体(例如 {ƫ),而无需在“映射”中查找字符代码。但是命名实体(例如 ")总是需要被查找。

这是我解决一般情况的尝试:

  1. 创建一个表来存储在 HTML 中定义的命名字符实体:

    CREATE TABLE ents (
    ref VARCHAR(8) NOT NULL COLLATE utf8_bin,
    rep CHAR(1) NOT NULL,
    PRIMARY KEY (ref)
    );
  2. 填充此表 - 我建议使用脚本,例如来自 PHP 的脚本:

    $dbh = new PDO("mysql:dbname=$dbname", $username, $password);
    $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, FALSE);
    $ins = $dbh->prepare('INSERT INTO ents (ref, rep) VALUES (?, ?)');
    $t = get_html_translation_table(HTML_ENTITIES);
    foreach ($t as $k => $v) $ins->execute([substr($v, 1, -1), $k]);
  3. 定义一个 SQL 函数来执行实体替换(在适用的情况下使用此表,否则使用字符代码):

    DELIMITER ;;

    CREATE FUNCTION dhe(s TEXT) RETURNS TEXT
    BEGIN
    DECLARE n, p, i, t INT DEFAULT 0;
    DECLARE r VARCHAR(12);
    entity_search: LOOP
    SET n := LOCATE('&', s, n+1);
    IF (!n) THEN
    LEAVE entity_search;
    END IF;

    IF (SUBSTRING(s, n+1, 1) = '#') THEN
    CASE
    WHEN SUBSTRING(s, n+2, 1) RLIKE '[[:digit:]]' THEN
    SET t := 2, p := n+2, r := '[[:digit:]]';
    WHEN SUBSTRING(s, n+2, 1) = 'x' THEN
    SET t := 3, p := n+3, r := '[[:xdigit:]]';
    ELSE ITERATE entity_search;
    END CASE;
    ELSE
    SET t := 1, p := n+1, r := '[[:alnum:]_]';
    END IF;

    SET i := 0;
    reference: LOOP
    IF SUBSTRING(s, p+i, 1) NOT RLIKE r THEN
    IF SUBSTRING(s, p+i, 1) RLIKE '[[:alnum:]_]' THEN
    ITERATE entity_search;
    END IF;
    LEAVE reference;
    END IF;
    IF i = 8 THEN ITERATE entity_search; END IF;
    SET i := i + 1;
    END LOOP reference;

    SET s := CONCAT(
    LEFT(s, n-1),
    CASE t
    WHEN 1 THEN COALESCE(
    (SELECT rep FROM ents WHERE ref = SUBSTRING(s, p, i))
    , SUBSTRING(s, n, i + IF(SUBSTRING(s, p+i, 1)=';',1,0))
    )
    WHEN 2 THEN CHAR(SUBSTRING(s, p, i))
    WHEN 3 THEN CHAR(CONV(SUBSTRING(s, p, i), 16, 10))
    END,
    SUBSTRING(s, p + i + IF(SUBSTRING(s, p+i, 1)=';',1,0))
    );
    END LOOP entity_search;
    RETURN s;
    END;;

    DELIMITER ;
  4. 两次应用此函数来解码您的(显然)双重编码的表格:

    UPDATE my_table SET my_column = dhe(dhe(my_column));

关于mysql用UTF等价物替换html特殊字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13890363/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com