Pytorch中Softmax与LogSigmoid的对比分析-6ren

Pytorch中Softmax与LogSigmoid的对比分析

转载作者：qq735679552 更新时间：2022-09-27 22:32:09

CFSDN坚持开源创造价值，我们致力于搭建一个资源共享平台，让每一个IT人在这里找到属于你的精彩世界.

这篇CFSDN的博客文章Pytorch中Softmax与LogSigmoid的对比分析由作者收集整理，如果你对这篇文章有兴趣，记得点赞哟.

pytorch中softmax与logsigmoid的对比

torch.nn.softmax

作用:

1、将softmax函数应用于输入的n维tensor，重新改变它们的规格，使n维输出张量的元素位于[0,1]范围内，并求和为1.

2、返回的tensor与原tensor大小相同，值在[0，1]之间.

3、不建议将其与nllloss一起使用，可以使用logsoftmax代替之.

4、softmax的公式:

Pytorch中Softmax与LogSigmoid的对比分析

参数:

维度，待使用softmax计算的维度.

例子:

 
    ? 
   
 
     
       
       
         # 随机初始化一个tensor 
        
 
         a  
         = 
         torch.randn( 
         2 
         ,  
         3 
         ) 
        
 
         print 
         (a)  
         # 输出tensor 
        
 
         # 初始化一个softmax计算对象，在输入tensor的第2个维度上进行此操作 
        
 
         m  
         = 
         nn.softmax(dim 
         = 
         1 
         ) 
        
 
         # 将a进行softmax操作 
        
 
         output  
         = 
         m(a) 
        
 
         print 
         (output)  
         # 输出tensor 
        

            
        
 
         tensor([[  
         0.5283 
         ,   
         0.3922 
         ,  
         - 
         0.0484 
         ], 
        
 
                  
         [ 
         - 
         1.6257 
         ,  
         - 
         0.4775 
         ,   
         0.5645 
         ]]) 
        
 
         tensor([[ 
         0.4108 
         ,  
         0.3585 
         ,  
         0.2307 
         ], 
        
 
                  
         [ 
         0.0764 
         ,  
         0.2408 
         ,  
         0.6828 
         ]]) 
        
 
     
 
   

可以看见的是，无论输入的tensor中的值为正或为负，输出的tensor中的值均为正值，且加和为1.

当m的参数dim=1时，输出的tensor将原tensor按照行进行softmax操作；当m的参数为dim=0时，输出的tensor将原tensor按照列进行softmax操作.

深度学习拓展:

一般来说，softmax函数会用于分类问题上。例如，在vgg等深度神经网络中，图像经过一系列卷积、池化操作后，我们可以得到它的特征向量，为了进一步判断此图像中的物体属于哪个类别，我们会将该特征向量变为：类别数 * 各类别得分的形式，为了将得分转换为概率值，我们会将该向量再经过一层softmax处理.

torch.nn.logsigmoid

公式:

Pytorch中Softmax与LogSigmoid的对比分析

函数图:

Pytorch中Softmax与LogSigmoid的对比分析

可以见得，函数值在[0, -]之间，输入值越大函数值距离0越近，在一定程度上解决了梯度消失问题.

例子:

 
    ? 
   
 
     
       
       
         a  
         = 
         [[  
         0.5283 
         ,   
         0.3922 
         ,  
         - 
         0.0484 
         ], 
        
 
              
         [ 
         - 
         1.6257 
         ,  
         - 
         0.4775 
         ,   
         0.5645 
         ]] 
        
 
         a  
         = 
         torch.tensor(a) 
        
 
         lg  
         = 
         nn.logsigmoid() 
        
 
         lgoutput  
         = 
         lg(a) 
        
 
         print 
         (lgoutput) 
        

            
        
 
         tensor([[ 
         - 
         0.4635 
         ,  
         - 
         0.5162 
         ,  
         - 
         0.7176 
         ], 
        
 
                  
         [ 
         - 
         1.8053 
         ,  
         - 
         0.9601 
         ,  
         - 
         0.4502 
         ]]) 
        
 
     
 
   

二者比较:

 
    ? 
   
 
     
       
       
         import 
         torch 
        
 
         import 
         torch.nn as nn 
        
 
         # 设置a为 2*3  的tensor 
        
 
         a  
         = 
         [[  
         0.5283 
         ,   
         0.3922 
         ,  
         - 
         0.0484 
         ], 
        
 
              
         [ 
         - 
         1.6257 
         ,  
         - 
         0.4775 
         ,   
         0.5645 
         ]] 
        
 
         a  
         = 
         torch.tensor(a) 
        
 
         print 
         (a) 
        
 
         print 
         ( 
         'a.mean:' 
         , a.mean( 
         1 
         , true))  
         # 输出a的 行平均值 
        

            
        
 
         m  
         = 
         nn.softmax(dim 
         = 
         1 
         )  
         # 定义softmax函数，dim=1表示为按行计算 
        
 
         lg  
         = 
         nn.logsigmoid()  
         # 定义logsigmoid函数 
        

            
        
 
         output  
         = 
         m(a) 
        
 
         print 
         (output) 
        
 
         # 输出a经过softmax的结果的行平均值 
        
 
         print 
         ( 
         'output.mean:' 
         , output.mean( 
         1 
         , true))  
        

            
        
 
         lg_output  
         = 
         lg(a) 
        
 
         print 
         (lg_output) 
        
 
         # 输出a经过logsigmoid的结果的行平均值 
        
 
         print 
         ( 
         'lgouput.mean:' 
         , lg_output.mean( 
         1 
         , true))  
        

            
        
 
         # 结果： 
        
 
         tensor([[  
         0.5283 
         ,   
         0.3922 
         ,  
         - 
         0.0484 
         ], 
        
 
                  
         [ 
         - 
         1.6257 
         ,  
         - 
         0.4775 
         ,   
         0.5645 
         ]]) 
        
 
         a.mean: tensor( 
         - 
         0.1111 
         ) 
        

            
        
 
         tensor([[ 
         0.4108 
         ,  
         0.3585 
         ,  
         0.2307 
         ], 
        
 
                  
         [ 
         0.0764 
         ,  
         0.2408 
         ,  
         0.6828 
         ]]) 
        
 
         output.mean: tensor([[ 
         0.3333 
         ], [ 
         0.3333 
         ]])  
         # 经过softmax的结果的行平均值 
        

            
        
 
         tensor([[ 
         - 
         0.4635 
         ,  
         - 
         0.5162 
         ,  
         - 
         0.7176 
         ], 
        
 
                  
         [ 
         - 
         1.8053 
         ,  
         - 
         0.9601 
         ,  
         - 
         0.4502 
         ]]) 
        
 
         lgouput.mean: tensor([[ 
         - 
         0.5658 
         ], [ 
         - 
         1.0719 
         ]])  
         # 经过logsigmoid的结果的行平均值 
        
 
     
 
   

由上可知，继续考虑分类问题，相同的数据，经过softmax和logsigmoid处理后，若取最大概率值对应类别作为分类结果，那么:

1、第一行数据经过softmax后，会选择第一个类别；经过logsigmoid后，会选择第一个.

2、第二行数据经过softmax后，会选择第三个类别；经过logsigmoid后，会选择第三个.

3、一般来说，二者在一定程度上区别不是很大，由于sigmoid函数存在梯度消失问题，所以被使用的场景不多.

4、但是在多分类问题上，可以尝试选择sigmoid函数来作为分类函数，因为softmax在处理多分类问题上，会更容易出现各项得分十分相近的情况。瓶颈值可以根据实际情况定.

nn.softmax()与nn.logsoftmax()

nn.softmax()计算出来的值，其和为1，也就是输出的是概率分布，具体公式如下:

Pytorch中Softmax与LogSigmoid的对比分析

这保证输出值都大于0，在0,1范围内.

而nn.logsoftmax()公式如下:

Pytorch中Softmax与LogSigmoid的对比分析

由于softmax输出都是0-1之间的，因此logsofmax输出的是小于0的数，。

softmax求导:

Pytorch中Softmax与LogSigmoid的对比分析

logsofmax求导:

Pytorch中Softmax与LogSigmoid的对比分析

例子:

 
    ? 
   
         import 
         torch.nn as nn 
        
         import 
         torch 
        
         import 
         numpy as np

 
    ? 
   
         layer1 
         = 
         nn.softmax() 
        
         layer2 
         = 
         nn.logsoftmax() 
        
         input 
         = 
         np.asarray([ 
         2 
         , 
         3 
         ]) 
        
         input 
         = 
         variable(torch.tensor( 
         input 
         )) 
        
         output1 
         = 
         layer1( 
         input 
         ) 
        
         output2 
         = 
         layer2( 
         input 
         ) 
        
         print 
         ( 
         'output1:' 
         ,output1) 
        
         print 
         ( 
         'output2:' 
         ,output2)

输出:

output1: variable containing: 0.2689 0.7311 [torch.floattensor of size 2] 。

output2: variable containing: -1.3133 -0.3133 [torch.floattensor of size 2] 。

以上为个人经验，希望能给大家一个参考，也希望大家多多支持我.

原文链接：https://blog.csdn.net/qq_38883844/article/details/104248622 。

最后此篇关于Pytorch中Softmax与LogSigmoid的对比分析的文章就讲到这里了,如果你想了解更多关于Pytorch中Softmax与LogSigmoid的对比分析的内容请搜索CFSDN的文章或继续浏览相关文章，希望大家以后支持我的博客！。

文章推荐： mysql优化之query_cache_limit参数说明

文章推荐： MySQL中存储时间的最佳实践指南

softmax - Softmax 交叉熵是否适用于多标签分类？
如前所述 here ，交叉熵不是多标签分类的合适损失函数。我的问题是“这个事实是否也适用于 softmax 的交叉熵？”。如果是，如何与this part匹配的文件。我应该提到我的问题的范围在cnt
machine-learning - softmax 和 log-softmax 有什么区别？
这两个函数之间的区别已在这篇 pytorch 帖子中描述:What is the difference between log_softmax and softmax? 是:exp(x_i) / ex
python - Tensorflow tf.nn.softmax() 函数比手写的 softmax 性能好很多
我正在使用 tensorflow 编写一个简单的逻辑回归。我发现当使用 tf.nn.softmax 时，算法收敛得更快，最终精度更高。如果切换到我自己的 softmax 实现，网络收敛速度较慢，最终精
python - 使用 softmax 作为 tf.keras 中的连续层和使用 softmax 作为密集层的激活函数有什么区别？
使用 softmax 作为 tf.keras 中的连续层和使用 softmax 作为密集层的激活函数有什么区别？ tf.keras.layers.Dense(10, activation=tf.nn.
machine-learning - keras.activations.softmax 和 keras.layers.Softmax 之间有什么区别？
keras.activations.softmax 和 keras.layers.Softmax 之间有什么区别？为什么同一个激活函数有两种定义？ keras.activations.softmax:
使用 Softmax 进行二元分类
我正在使用带有二进制交叉熵的 Sigmoid 激活函数训练一个二进制分类器，它提供了大约 98% 的良好准确度。当我使用带有 categorical_crossentropy 的 softmax 进
tensorflow - 全卷积网络的每像素 softmax
我正在尝试实现类似完全卷积网络的东西，其中最后一个卷积层使用过滤器大小 1x1 并输出“分数”张量。分数张量的形状为 [Batch, height, width, num_classes]。我的问题
java - Softmax 激活实现
我目前正在用 Java 实现我自己的神经网络。我已经实现了一些常见的激活函数，例如 Sigmoid 或 ReLU，但我不知道如何实现 Softmax。我想要一个像这样的方法 private doub
java - Softmax 激活实现
我目前正在用 Java 实现我自己的神经网络。我已经实现了一些常见的激活函数，例如 Sigmoid 或 ReLU，但我不知道如何实现 Softmax。我想要一个像这样的方法 private doub
python - 将正态分布转换为 softmax
我在 github 上找到了一个很好的强化学习示例，我想使用它。我的问题是输出是正态分布层(下面的代码)，因为它用于连续 Action 空间，而我想将它用于离散 Action 空间，其中模型有 4 个
tensorflow - softmax 回归中的权重是一维还是二维？
我已经学习了 ML，并且一直在 Andrew N.G 的 coursera 类(class)中学习 DL，每次他谈到线性分类器时，权重都只是一个一维向量。即使在分配期间，当我们将图像滚动到一维向量(像
r - softmax 输出的神经网络无法收敛
我一直在研究斯坦福的深度学习教程，但我在其中一个练习(带有 softmax 输出层的神经网络)上遇到了问题。这是我在 R 中的实现: train <- function(training.set, l
matlab - Softmax 回归的向量化实现
我正在 Octave 中实现 softmax 回归。目前，我正在使用使用以下成本函数和导数的非矢量化实现。来源:Softmax Regression 现在我想在 Octave 中实现它的矢量化版本。
python - softmax python计算
我是机器学习的新手，正在学习如何在 python 中实现 softmax，我正在关注以下线程 Softmax function - python 我在做一些分析，如果我们有一个数组 batch = n
python - 大量错误的 Softmax
下面是我尝试计算 softmax 的一小段代码。它适用于单个阵列。但是对于更大的数字，比如 1000 等，它会爆炸 import numpy as np def softmax(x): print
keras - 如果可以激活多个输出，softmax 层的替代品是什么？
例如，我有一个 CNN，它试图从 MNIST 数据集(使用 Keras 编写的代码)中预测数字。它有 10 个输出，形成 softmax 层。只有一个输出可以为真(独立于 0 到 9 的每个数字):
pytorch - 我应该在交叉熵之前应用 softmax 吗？
pytorch教程 ( https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-bli
python - 理解 softmax 输出层的目标数据
我找到了一些 MNIST 手写字符分类问题的示例代码。代码开头如下: import tensorflow as tf # Load in the data mnist = tf.keras.datas
python - Keras softmax 输出和准确率
这是 Keras 模型的最后一层。 model.add(Dense(3, activation='softmax')) model.compile(loss='categorical_crossent
math - 为什么使用 softmax 而不是标准标准化？
在神经网络的输出层中，通常使用softmax函数来近似概率分布: 由于指数的原因，计算成本很高。为什么不简单地执行 Z 变换，使所有输出均为正，然后通过将所有输出除以所有输出之和来进行归一化？最佳答

qq735679552

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Pytorch中Softmax与LogSigmoid的对比分析

pytorch中softmax与logsigmoid的对比

torch.nn.softmax

torch.nn.logsigmoid

nn.softmax()与nn.logsoftmax()