gpt4 book ai didi

bert-language-model - Transformer/BERT token 预测词汇表(从可能的 token 集中过滤出特殊 token )

转载 作者:行者123 更新时间:2023-12-05 06:04:51 26 4
gpt4 key购买 nike

对于 Transformer 模型,尤其是 BERT,以编程方式禁止模型以特殊标记作为预测结果是否有意义(并且在统计上是否正确)?在最初的实现中情况如何?在收敛过程中,模型必须学会不预测这些,但这种干预是否有帮助(或相反)?

  • 我会主要考虑 [MASK]、[CLS] 代币
  • [PAD] token 也有一定意义(但并非在所有情况下都如此)

最佳答案

如果我理解你的问题,你是在问 BERT(或其他基于 Transformer 的模型)如何处理特殊字符。与预处理步骤相比,这与模型架构的相关性较低(即此答案与自回归模型甚至非神经模型相关)。

特别是,BERT 分词器使用字节对编码分词器将文本和分词拆分为子词。如果标记器无法识别字符序列,它将用 UNK 元标记替换字符序列,非常类似于 MASKCLS token 。如果您想查看更多细节,Google 会有很多答案,例如,来自 a blog article :

There is an important point to note when we use a pre-trained model. Since the model is pre-trained on a certain corpus, the vocabulary was also fixed. In other words, when we apply a pre-trained model to some other data, it is possible that some tokens in the new data might not appear in the fixed vocabulary of the pre-trained model. This is commonly known as the out-of-vocabulary (OOV) problem.

For tokens not appearing in the original vocabulary, it is designed that they should be replaced with a special token [UNK], which stands for unknown token.

关于bert-language-model - Transformer/BERT token 预测词汇表(从可能的 token 集中过滤出特殊 token ),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66190946/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com