gpt4 book ai didi

deep-learning - token 索引序列长度大于此模型的指定最大序列长度 (651 > 512),带有 Hugging face 情感分类器

转载 作者:行者123 更新时间:2023-12-04 13:09:35 33 4
gpt4 key购买 nike

我试图在拥抱面部情绪分析预训练模型的帮助下获得评论的情绪。它返回类似 Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier 的错误.
下面我附上代码请看

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import transformers
import pandas as pd

model = AutoModelForSequenceClassification.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')
token = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')

classifier = pipeline(task='sentiment-analysis', model=model, tokenizer=token)

data = pd.read_csv('/content/drive/MyDrive/DisneylandReviews.csv', encoding='latin-1')

data.head()
输出是
    Review
0 If you've ever been to Disneyland anywhere you...
1 Its been a while since d last time we visit HK...
2 Thanks God it wasn t too hot or too humid wh...
3 HK Disneyland is a great compact park. Unfortu...
4 the location is not in the city, took around 1...
其次是
classifier("My name is mark")
输出是
[{'label': 'POSITIVE', 'score': 0.9953688383102417}]
跟上代码
basic_sentiment = [i['label'] for i in value if 'label' in i]
basic_sentiment
输出是
['POSITIVE']
将总行数附加到空列表
text = []

for index, row in data.iterrows():
text.append(row['Review'])
我正在尝试获取所有行的情绪
sent = []

for i in range(len(data)):
sentiment = classifier(data.iloc[i,0])
sent.append(sentiment)
错误是:
Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512). Running this sequence through the model will result in indexing errors
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-19-4bb136563e7c> in <module>()
2
3 for i in range(len(data)):
----> 4 sentiment = classifier(data.iloc[i,0])
5 sent.append(sentiment)

11 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1914 # remove once script supports set_grad_enabled
1915 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1917
1918

IndexError: index out of range in self

最佳答案

你的Review中的一些句子数据框的列太长。当这些句子被转换为标记并在模型内部发送时,它们超过了 512 seq_length model的极限, sentiment-analysis中使用的模型的嵌入任务是在 512 上训练的 token 嵌入。
要解决此问题,您可以过滤掉长句并仅保留较小的句子(标记长度 < 512 )
或者你可以用 truncating = True 截断句子

sentiment = classifier(data.iloc[i,0], truncation=True)

关于deep-learning - token 索引序列长度大于此模型的指定最大序列长度 (651 > 512),带有 Hugging face 情感分类器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66954682/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com