gpt4 book ai didi

python - 累积梯度

转载 作者:行者123 更新时间:2023-12-03 17:03:39 24 4
gpt4 key购买 nike

我想在向后传递之前累积梯度。所以想知道正确的做法是什么。根据 this article
它的:

model.zero_grad()                                   # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad()

而我预计它是:
model.zero_grad()                                   # Reset gradients tensors
loss = 0
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss += loss_function(predictions, labels) # Compute loss function
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
optimizer.step() # Now we can do an optimizer step
model.zero_grad()
loss = 0

我累积损失,然后除以累积步骤以求平均。

第二个问题,如果我是对的,考虑到我只在每个累积步骤中进行反向传递,您是否希望我的方法更快?

最佳答案

所以根据答案here ,第一种方法是内存高效的。两种方法所需的工作量或多或少相同。

第二种方法不断累积图形,因此需要 accumulation_steps多倍的内存。第一种方法直接计算梯度(并简单地添加梯度),因此需要较少的内存。

关于python - 累积梯度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53331540/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com