gpt4 book ai didi

weird behivior undefined with PreTrainedTokenizer __init__(未使用PreTrainedTokenizer__init__定义奇怪行为)

转载 作者:bug小助手 更新时间:2023-10-28 11:32:23 33 4
gpt4 key购买 nike



so this took me a while to catch and I am still not 100% sure whats going on seems like the type of things r switching for no reason as if by magic. the behvior goes away as soon as I remove the inhertance but here is the code

所以我花了一段时间才领会到这一点,我仍然不能百分之百确定发生了什么,看起来像是无缘无故地像是魔法一样切换的东西。只要我删除继承,行为就会消失,但下面是代码


class TokompilerHF(PreTrainedTokenizer):
'''
Hugging Face compatible version of Tokompiler
'''
def __init__(self, vocab_file, **kwargs):
super().__init__(**kwargs)
self.tokompiler = Tokompiler(vocab_file)
pad_token_id=self.tokompiler.encoder['[UNK]']
#black magic for some reason removing the print changes
print(type(pad_token_id))
print(pad_token_id)
self.pad_token_id=None
self.pad_token_id=pad_token_id
print(type(self.pad_token_id))
print(self.pad_token_id)

class TokompilerHF(PreTrainedTokenizer):
'''
Hugging Face compatible version of Tokompiler
'''
def __init__(self, vocab_file, **kwargs):
super().__init__(**kwargs)
self.tokompiler = Tokompiler(vocab_file)
pad_token_id=self.tokompiler.encoder['[UNK]']
#black magic for some reason removing the print changes
print(type(pad_token_id))
print(pad_token_id)
self.pad_token_id=None
self.pad_token_id=pad_token_id
print(type(self.pad_token_id))
print(self.pad_token_id)

def _tokenize(self, text, **kwargs):
return self.tokompiler.tokenize(text)

def _convert_token_to_id(self, token):
return self.tokompiler.encode(token)

def _convert_id_to_token(self, index):
return self.tokompiler.decode([index])

def convert_tokens_to_string(self, tokens):
return ' '.join(tokens)

def _convert_tokens_to_ids(self, tokens):
return [self._convert_token_to_id(token) for token in tokens]

def _convert_ids_to_tokens(self, ids):
return [self._convert_id_to_token(id) for id in ids]

def get_vocab(self):
return self.tokompiler.encoder.copy()

@property
def vocab_size(self):
return len(self.tokompiler.encoder)
tokenizer=TokompilerHF('tokenizer_vocab/vocab.txt')

now this prints

现在这个打印出来了


<class 'int'>
4
<class 'list'>
[4]

4[4]


expected that if I assin a value that value dosent change 1 line of code after

预期如果我赋值一个值,该值不会更改1行代码之后


更多回答
优秀答案推荐
更多回答

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com