- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在构建基于 Huggingface Longformer 的分类器。下面是我的主要代码
model = LongformerForSequenceClassification.from_pretrained('/mnt/longformer_official/',
gradient_checkpointing=False,
attention_window = 512)
tokenizer = LongformerTokenizerFast.from_pretrained('/mnt/longformer_official/', max_length = 4000)
train_df_tuning_dataset_tokenized = train_df_tuning_dataset.map(tokenization, batched = True, batch_size = len(train_df_tuning_dataset))
training_args = TrainingArguments(
output_dir="xyz",
num_train_epochs = 5,# changed this from 5
per_device_train_batch_size = 4,#4,#8,#adding on 18 march from huggingface example notebook
gradient_accumulation_steps = 16,#16, #8 adding it back 18 march even though missing in huggingface example notebook as otherwise memory issues
per_device_eval_batch_size= 16,#16
evaluation_strategy = "epoch",
save_strategy = "epoch",#adding on 18 march from huggingface example notebook
learning_rate=2e-5,#adding on 18 march from huggingface example notebook
load_best_model_at_end=True,
greater_is_better=False,
disable_tqdm = False,
weight_decay=0.01,
optim="adamw_torch",#removing on 18 march from huggingface example notebook
run_name = 'longformer-classification-16March2022'
)
#class weights
class CustomTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels")
# forward pass
outputs = model(**inputs)
logits = outputs.get("logits")
# compute custom loss (suppose one has 3 labels with different weights)
loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 0.5243])).to(device)
loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1)).to(device)
return (loss, outputs) if return_outputs else loss
trainer = CustomTrainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_df_tuning_dataset_tokenized,
eval_dataset=val_dataset_tokenized
)
当我在 tokenizer
中尝试 max_length=1500
时,代码运行正常。使用 max_length=4000
运行时失败我什至尝试将这些参数设置为per_device_train_batch_size = 1,gradient_accumulation_steps = 1,per_device_eval_batch_size = 1
我的问题:
可以设置 per_device_train_batch_size = 1, gradient_accumulation_steps = 1, per_device_eval_batch_size = 1
吗?
我得到的错误如下。除了获得更多内存之外,还有其他解决方法吗?
运行时错误:CUDA 内存不足。尝试分配 720.00 MiB(GPU 0;14.76 GiB 总容量;12.77 GiB 已分配;111.75 MiB 空闲;PyTorch 总共保留 13.69 GiB)如果保留内存是 >> 已分配内存,请尝试设置 max_split_size_mb 以避免碎片化。请参阅内存管理和 PYTORCH_CUDA_ALLOC_CONF 的文档
最佳答案
尝试设置
gradient_accumulation_steps = int(math.ceil(len(tr_inputs) / per_device_train_batch_size) / 1) * epochs
因为 gradient_aacumulation_steps
应该基于 epochs 和 batch size 导出
关于nlp - huggingface longformer内存问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71668624/
使用 longformer API 返回有限数量层的正确方法是什么? 与基本情况不同BERT ,我不清楚返回类型如何只获取最后 N 层。 所以,我运行这个: from transformers imp
我正在尝试使更长的标签显示与文本框内联。我希望标签位于一行并与文本框内联显示。我正在使用 Bootstrap 3,但我似乎无法弄清楚如何实现这一点。下面是一些示例代码: Two line lab
当我尝试运行此 page 中的代码时,我收到以下警告. /usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309:
想做类似的事情 tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretra
我是一名优秀的程序员,十分优秀!