gpt4 book ai didi

machine-learning - 如何计算最佳批量大小

转载 作者:行者123 更新时间:2023-11-30 08:22:00 24 4
gpt4 key购买 nike

有时我会遇到一个问题:

OOM when allocating tensor with shape

例如

OOM when allocating tensor with shape (1024, 100, 160)

其中 1024 是我的批量大小,我不知道其余的是什么。如果我减少批量大小或模型中神经元的数量,它就可以正常运行。

是否有一种通用方法可以根据模型和 GPU 内存计算最佳批量大小,以便程序不会崩溃?

简而言之:我希望模型的批量大小尽可能大,这样可以适合我的 GPU 内存,并且不会使程序崩溃。

最佳答案

来自 Goodfellow 等人最近出版的《深度学习》一书,chapter 8 :

Minibatch sizes are generally driven by the following factors:

  • Larger batches provide a more accurate estimate of the gradient, butwith less than linear returns.
  • Multicore architectures are usuallyunderutilized by extremely small batches. This motivates using someabsolute minimum batch size, below which there is no reduction in thetime to process a minibatch.
  • If all examples in the batch are to beprocessed in parallel (as is typically the case), then the amount ofmemory scales with the batch size. For many hardware setups this isthe limiting factor in batch size.
  • Some kinds of hardware achievebetter runtime with specific sizes of arrays. Especially when usingGPUs, it is common for power of 2 batch sizes to offer better runtime.Typical power of 2 batch sizes range from 32 to 256, with 16 sometimesbeing attempted for large models.
  • Small batches can offer aregularizing effect (Wilson and Martinez, 2003), perhaps due to thenoise they add to the learning process. Generalization error is oftenbest for a batch size of 1. Training with such a small batch sizemight require a small learning rate to maintain stability because ofthe high variance in the estimate of the gradient. The total runtimecan be very high as a result of the need to make more steps, bothbecause of the reduced learning rate and because it takes more stepsto observe the entire training set.

这在实践中通常意味着“2 的幂,越大越好,前提是该批处理适合您的 (GPU) 内存”。

您可能还想查阅 Stack Exchange 中的一些好帖子:

请记住,Keskar 等人的论文。 'On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima ',由上面几篇文章引用,已收到 some objections由深度学习社区其他受人尊敬的研究人员提出。

希望这有帮助...

更新(2017 年 12 月):

Yoshua Bengio 和团队发表了一篇新论文,Three Factors Influencing Minima in SGD (2017 年 11 月);它报告了关于学习率和批量大小之间相互作用的新理论和实验结果,因此值得一读。

更新(2021 年 3 月):

这里有趣的是 2018 年的另一篇论文,Revisiting Small Batch Training for Deep Neural Networks (h/t to Nicolas Gervais),这与越大越好的建议相反;引用摘要:

The best performance has been consistently obtained for mini-batch sizes between m=2 and m=32, which contrasts with recent work advocating the use of mini-batch sizes in the thousands.

关于machine-learning - 如何计算最佳批量大小,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46654424/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com