gpt4 book ai didi

unicode - 奇怪的(unicode?)字符

转载 作者:行者123 更新时间:2023-12-02 11:22:24 26 4
gpt4 key购买 nike

一个用户在我的网站上发布了一些奇怪的字符,我想阻止他们这样做,但不阻止外语中使用的字符......因此,使用正则表达式,例如 [a-z0-9!@#$%^&*()...]不是一个选项。

有人可以向我解释这里发生了什么,分解为什么它显示它的方式。角色是如何创建的,我该如何防止他们这样做?

♥̧̧̧̛̣̘̟̘̥͓̫̪̹̪̪̮̯̞̘̙̦̝̭̭͕̜̰̩̗̟̹͔̜̥̟̗̗̥̦̠̖̫͕̺̻̞̥̹͇̱̥̥̻͇̦̙̣͊͗̉̽̈́̉͑̀́̃͒̏͋̃̅̇̊̏̎̈́͊͐̉͑̄̌̉́̈́́́̅̇͌̽̽͗́̄̾̓̈́̇̅͛́̈́͐̽̔̌̋̌̾́̿͌̔͊͆̈́̉́̎̔̊͗̊̂̎̍̏̈̀̏͋͌̋̽̄̐̽͐̀͘̕̕͘̕̚̚̚͘͜͜͜͠͝͠͝͠͝

谢谢

编辑:所以他们习惯于重音字符?是否有一种常见的做法或方法可以防止用户在不完全阻止它们的情况下利用它们?我对外语或其实际用途/目的知之甚少,因此制作一些东西来限制组合字符的使用超出了我的可能性范围。 :-/

最佳答案

这些是 combining diacritical marks .对于字符 é e-acute,您可以使用代码点 U+00E9 (LATIN_SMALL_LETTER_E_WITH_ACUTE) 或序列 U+0065 U+0301 (LATIN_SMALL_LETTER_E COMBINING_ACUTE_ACENT) 来表示它,其中文本渲染器将重音放在前面的代码点之上。

用户正在使用一系列组合标记来利用它:

codepoint   glyph   escaped    UTF-8           info
=======================================================================
U+2665 ♥ \u2665 e2,99,a5, MISCELLANEOUS_SYMBOLS, OTHER_SYMBOL
U+034a ͊ \u034a cd,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0360 ͠ \u0360 cd,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0357 ͗ \u0357 cd,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0351 ͑ \u0351 cd,91, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0340 ̀ \u0340 cd,80, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035d ͝ \u035d cd,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0303 ̃ \u0303 cc,83, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0352 ͒ \u0352 cd,92, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034b ͋ \u034b cd,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0303 ̃ \u0303 cc,83, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0305 ̅ \u0305 cc,85, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0307 ̇ \u0307 cc,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030a ̊ \u030a cc,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030e ̎ \u030e cc,8e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034a ͊ \u034a cd,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0350 ͐ \u0350 cd,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0351 ͑ \u0351 cd,91, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0304 ̄ \u0304 cc,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030c ̌ \u030c cc,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0341 ́ \u0341 cd,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0305 ̅ \u0305 cc,85, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0307 ̇ \u0307 cc,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034c ͌ \u034c cd,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0357 ͗ \u0357 cd,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0360 ͠ \u0360 cd,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0304 ̄ \u0304 cc,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033e ̾ \u033e cc,be, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0343 ̓ \u0343 cd,83, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0307 ̇ \u0307 cc,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0358 ͘ \u0358 cd,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0305 ̅ \u0305 cc,85, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035d ͝ \u035d cd,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035b ͛ \u035b cd,9b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0350 ͐ \u0350 cd,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0314 ̔ \u0314 cc,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030c ̌ \u030c cc,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030b ̋ \u030b cc,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030c ̌ \u030c cc,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033e ̾ \u033e cc,be, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0360 ͠ \u0360 cd,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033f ̿ \u033f cc,bf, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034c ͌ \u034c cd,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0314 ̔ \u0314 cc,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0315 ̕ \u0315 cc,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034a ͊ \u034a cd,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0346 ͆ \u0346 cd,86, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035d ͝ \u035d cd,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0341 ́ \u0341 cd,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0315 ̕ \u0315 cc,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030e ̎ \u030e cc,8e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0314 ̔ \u0314 cc,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030a ̊ \u030a cc,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0357 ͗ \u0357 cd,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0358 ͘ \u0358 cd,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030a ̊ \u030a cc,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0315 ̕ \u0315 cc,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0302 ̂ \u0302 cc,82, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030e ̎ \u030e cc,8e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030d ̍ \u030d cc,8d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0308 ̈ \u0308 cc,88, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0340 ̀ \u0340 cd,80, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031a ̚ \u031a cc,9a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034b ͋ \u034b cd,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031a ̚ \u031a cc,9a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031a ̚ \u031a cc,9a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034c ͌ \u034c cd,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030b ̋ \u030b cc,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0304 ̄ \u0304 cc,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0310 ̐ \u0310 cc,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0350 ͐ \u0350 cd,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031b ̛ \u031b cc,9b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0358 ͘ \u0358 cd,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0300 ̀ \u0300 cc,80, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0323 ̣ \u0323 cc,a3, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0318 ̘ \u0318 cc,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031f ̟ \u031f cc,9f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035c ͜ \u035c cd,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0318 ̘ \u0318 cc,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035c ͜ \u035c cd,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0353 ͓ \u0353 cd,93, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032b ̫ \u032b cc,ab, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032a ̪ \u032a cc,aa, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0339 ̹ \u0339 cc,b9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032a ̪ \u032a cc,aa, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032a ̪ \u032a cc,aa, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035c ͜ \u035c cd,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032e ̮ \u032e cc,ae, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032f ̯ \u032f cc,af, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0327 ̧ \u0327 cc,a7, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031e ̞ \u031e cc,9e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0318 ̘ \u0318 cc,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0319 ̙ \u0319 cc,99, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0326 ̦ \u0326 cc,a6, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031d ̝ \u031d cc,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032d ̭ \u032d cc,ad, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032d ̭ \u032d cc,ad, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0355 ͕ \u0355 cd,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031c ̜ \u031c cc,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0330 ̰ \u0330 cc,b0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0329 ̩ \u0329 cc,a9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0317 ̗ \u0317 cc,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031f ̟ \u031f cc,9f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0339 ̹ \u0339 cc,b9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0354 ͔ \u0354 cd,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031c ̜ \u031c cc,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031f ̟ \u031f cc,9f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0317 ̗ \u0317 cc,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0317 ̗ \u0317 cc,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0326 ̦ \u0326 cc,a6, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0320 ̠ \u0320 cc,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0316 ̖ \u0316 cc,96, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032b ̫ \u032b cc,ab, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0355 ͕ \u0355 cd,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033a ̺ \u033a cc,ba, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0327 ̧ \u0327 cc,a7, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033b ̻ \u033b cc,bb, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031e ̞ \u031e cc,9e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0327 ̧ \u0327 cc,a7, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0339 ̹ \u0339 cc,b9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0347 ͇ \u0347 cd,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0331 ̱ \u0331 cc,b1, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033b ̻ \u033b cc,bb, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0347 ͇ \u0347 cd,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0326 ̦ \u0326 cc,a6, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0319 ̙ \u0319 cc,99, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0323 ̣ \u0323 cc,a3, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK

我在评论中提出的一些观点:
  • Unicode standard如果没有意义,则认为所有代码点序列都是有效的(参见 Unicode 6 的 chapter 2)
  • Unicode 没有描述代码点应该如何显示——这取决于文本渲染技术
  • Normalizing到 NFC 并匹配 code point category可能对检测冗余变音符号很有用
  • 您可以使用浏览器控制台创建类似上面的序列
  • 只需输入 UTF-16 JavaScript 字符串文字,如 "\u2665\u034a\u0360\u0357"
  • 您可以只使用来自 the charts 的代码点值basic multilingual plane 中的任何内容
  • 对于 BMP 之外的任何内容,您必须translate the code points to UTF-16
  • 关于unicode - 奇怪的(unicode?)字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22233001/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com