python - 如何解决与输入大小 (torch.Size([1])) 不同的 UserWarning : Using a target size (torch. Size([]))？-6ren

python - 如何解决与输入大小 (torch.Size([1])) 不同的 UserWarning : Using a target size (torch. Size([]))？

转载作者：行者123 更新时间：2023-12-04 15:27:50

我正在尝试运行我购买的一本关于 Pytorch 强化学习的书中的代码。
代码应该按照本书工作，但对我来说，模型没有收敛，奖励仍然为负。它还会收到以下用户警告:

/home/user/.local/lib/python3.6/site-packages/ipykernel_launcher.py:30: UserWarning: Using a target size (torch.Size([])) that is different to the input size (torch.Size([1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

我是 Pytorch 的完全初学者，但我认为 size([]) 不是有效的张量大小？我认为代码中出了点问题，但是在尝试通过它一段时间后，我还没有找到任何东西。前段时间我也给图书出版商发了消息，但很遗憾没有收到他们的回复。

这就是为什么我想在这里问是否有人见过这个错误并且可能知道如何解决它？

该代码用于在山地车健身房环境中实现 A2C 强化学习。我也可以在这里找到: https://github.com/PacktPublishing/PyTorch-1.x-Reinforcement-Learning-Cookbook/blob/master/Chapter08/chapter8/actor_critic_mountaincar.py

'''
Source codes for PyTorch 1.0 Reinforcement Learning (Packt Publishing)
Chapter 8: Implementing Policy Gradients and Policy Optimization
Author: Yuxi (Hayden) Liu
'''

import torch
import gym
import torch.nn as nn
import torch.nn.functional as F


env = gym.make('MountainCarContinuous-v0')


class ActorCriticModel(nn.Module):
    def __init__(self, n_input, n_output, n_hidden):
        super(ActorCriticModel, self).__init__()
        self.fc = nn.Linear(n_input, n_hidden)
        self.mu = nn.Linear(n_hidden, n_output)
        self.sigma = nn.Linear(n_hidden, n_output)
        self.value = nn.Linear(n_hidden, 1)
        self.distribution = torch.distributions.Normal

    def forward(self, x):
        x = F.relu(self.fc(x))
        mu = 2 * torch.tanh(self.mu(x))
        sigma = F.softplus(self.sigma(x)) + 1e-5
        dist = self.distribution(mu.view(1, ).data, sigma.view(1, ).data)
        value = self.value(x)
        return dist, value


class PolicyNetwork():
    def __init__(self, n_state, n_action, n_hidden, lr=0.001):
        self.model = ActorCriticModel(n_state, n_action, n_hidden)
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr)


    def update(self, returns, log_probs, state_values):
        """
        Update the weights of the Actor Critic network given the training samples
        @param returns: return (cumulative rewards) for each step in an episode
        @param log_probs: log probability for each step
        @param state_values: state-value for each step
        """
        loss = 0
        for log_prob, value, Gt in zip(log_probs, state_values, returns):
            advantage = Gt - value.item()
            policy_loss = - log_prob * advantage

            value_loss = F.smooth_l1_loss(value, Gt)

            loss += policy_loss + value_loss

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()


    def predict(self, s):
        """
        Compute the output using the continuous Actor Critic model
        @param s: input state
        @return: Gaussian distribution, state_value
        """
        self.model.training = False
        return self.model(torch.Tensor(s))

    def get_action(self, s):
        """
        Estimate the policy and sample an action, compute its log probability
        @param s: input state
        @return: the selected action, log probability, predicted state-value
        """
        dist, state_value = self.predict(s)
        action = dist.sample().numpy()
        log_prob = dist.log_prob(action[0])
        return action, log_prob, state_value




def actor_critic(env, estimator, n_episode, gamma=1.0):
    """
    continuous Actor Critic algorithm
    @param env: Gym environment
    @param estimator: policy network
    @param n_episode: number of episodes
    @param gamma: the discount factor
    """
    for episode in range(n_episode):
        log_probs = []
        rewards = []
        state_values = []
        state = env.reset()

        while True:
            state = scale_state(state)
            action, log_prob, state_value = estimator.get_action(state)
            action = action.clip(env.action_space.low[0],
                                 env.action_space.high[0])
            next_state, reward, is_done, _ = env.step(action)

            total_reward_episode[episode] += reward
            log_probs.append(log_prob)
            state_values.append(state_value)
            rewards.append(reward)

            if is_done:
                returns = []

                Gt = 0
                pw = 0

                for reward in rewards[::-1]:

                    Gt += gamma ** pw * reward
                    pw += 1
                    returns.append(Gt)

                returns = returns[::-1]
                returns = torch.tensor(returns)
                returns = (returns - returns.mean()) / (returns.std() + 1e-9)


                estimator.update(returns, log_probs, state_values)
                print('Episode: {}, total reward: {}'.format(episode, total_reward_episode[episode]))

                break

            state = next_state


import sklearn.preprocessing
import numpy as np

state_space_samples = np.array(
    [env.observation_space.sample() for x in range(10000)])
scaler = sklearn.preprocessing.StandardScaler()
scaler.fit(state_space_samples)


def scale_state(state):
    scaled = scaler.transform([state])
    return scaled[0]


n_state = env.observation_space.shape[0]
n_action = 1
n_hidden = 128
lr = 0.0003
policy_net = PolicyNetwork(n_state, n_action, n_hidden, lr)


n_episode = 200
gamma = 0.9
total_reward_episode = [0] * n_episode

actor_critic(env, policy_net, n_episode, gamma)

最佳答案

size([]) 是有效的，但它代表一个单一的值，而不是一个数组，而 size([1]) 是一个只包含一个 item 的一维数组。这就像将 5 与 [5] 进行比较。对此的一种解决方案是

            returns = returns[::-1]
            returns_amount = len(returns)
            returns = torch.tensor(returns)
            returns = (returns - returns.mean()) / (returns.std() + 1e-9)
            returns.resize_(returns_amount, 1)

这将返回转换为二维数组，因此您从中获得的每个 Gt 将是一维数组，而不是浮点数。

关于python - 如何解决与输入大小 (torch.Size([1])) 不同的 UserWarning : Using a target size (torch. Size([]))？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61912681/

文章推荐： python - 如何将 @tf.function 与 Keras 顺序 API 一起使用？

文章推荐： python - 满足特定条件的列表列表的所有组合

文章推荐： python - 在 python 中等待输入时打印到控制台

How can I resolve "UserWarning: The palette list has more values (10) than needed (4), which may not be intended"?(如何解析“UserWarning：调色板列表的值(10)多于所需的值(4)，这可能不是故意的”？)
我用的是“tab10”调色板，因为它的颜色是蓝色、绿色、橙色和红色。。簇的数量只有4个，调色板“tab10”有4种以上的颜色。有没有办法解决这个用户警告问题？。输出为：
How can I resolve "UserWarning: The palette list has more values (10) than needed (4), which may not be intended"?(如何解析“UserWarning：调色板列表的值(10)多于所需的值(4)，这可能不是故意的”？)
我用的是“tab10”调色板，因为它的颜色是蓝色、绿色、橙色和红色。。簇的数量只有4个，调色板“tab10”有4种以上的颜色。有没有办法解决这个用户警告问题？。输出为：
Matplotlib UserWarning - 冗余定义的标记
从 matplotlib 收到此警告，我尝试设置 fmt，但它对我不起作用(也许做错了)。我确实抑制了警告并且它起作用了，但我不想抑制所有 python 警告，因为我认为这应该可以解决。谢谢你的帮助。
python - 如何在pytest中忽略python UserWarning？
我使用 openpyxl 来解析 .xlsm 文件，并使用 pytest 进行测试。当我打开文件时，我得到: OpenPyxl -> UserWarning:不支持数据验证扩展，将被删除这并不是真
python - 从 urllib2 隐藏 UserWarning
我使用带有密码管理器的开启器，当我第一次使用我的开启器时，我收到了以下警告消息: /usr/lib/python2.7/urllib2.py:894: UserWarning: Basic Auth
python - 计算 GeoSeries 的质心时收到 UserWarning
运行命令 dataframe['geometry'].centroid显示警告: 列“几何”由多多边形对象组成。如何解决此问题以准确计算多多边形形状的质心？最佳答案这个错误可以通过投影来解决这个问
python - 带有 OR 运算符的 DataFrame UserWarning
这个问题在这里已经有了答案: Python: UserWarning: This pattern has match groups. To actually get the groups, use
Python pandas 有效地删除 UserWarning 和循环
假设我有类似这样的代码: import pandas as pd df=pd.DataFrame({'Name': [ 'Jay Leno', 'JayLin', 'Jay-Jameson', 'Li
python - 你如何在 Python 中更正模块已经加载的 UserWarnings？
在命令行中运行大多数 python 脚本时会收到以下类型的警告: /Library/Python/2.6/site-packages/virtualenvwrapper/hook_loader.py:
python - UserWarning : Could not import the lzma module. 您安装的Python不完整
安装 Google Cloud Bigquery 模块后，如果我将该模块导入 python 代码。我看到这条警告消息。在 python 3.7.3 Virtualenv 中发生在我身上。尝试重新安装
python - 如何避免 python 在屏蔽数据后使用 UserWarning 进行隐式修复
当我用另一组数据屏蔽我的数据集时，它会显示用户警告: bool 系列键将被重新索引以匹配 DataFrame 索引。我该如何避免这种情况？ Python 会自动重新索引它，但该列的标题是空白的，我似乎
Python - Openpyxl - "UserWarning: Unknown extension"问题
我正在尝试学习 Python(第 2 天)，并希望首先使用 Excel 书籍进行练习，因为这是我感到舒适/流利的地方。在运行以下代码时，我立即遇到了一个我无法理解的错误: import openpy
python - 使用 matplotlib 给我以下警告 : "UserWarning: tight_layout:
尝试使用 python matplotlib 绘制图形:但不断收到以下警告消息: "UserWaring: tight_layout: falling back to Agg renderer wa
python - "UserWarning: Unbuilt egg for setuptools"- 这到底是什么意思？
当我使用 pip 将东西安装到 virtualenv 中时，我经常看到消息“UserWarning: Unbuilt egg for setuptools”。我总是安全地忽略它并继续我的业务，它似
pandas - UserWarning : This pattern is interpreted as a regular expression, 并且有匹配组
给定以下 pandas DataFrame - json_path报告组实体/分组实体 ID调整后值(value)(今天，无股息，美元)调整后的 TWR(当前季度，无股息，美元)调整后的 TWR(年初
python - UserWarning : Matplotlib is currently using agg, 所以无法显示数字
我正在尝试运行来自 official website 的基本 matplotlib 示例: 但是，当我运行代码时，我的 Python 解释器会报错并输出以下消息: UserWarning: Matpl
python - UserWarning : Matplotlib is currently using agg, 所以无法显示数字
我正在尝试运行来自 official website 的基本 matplotlib 示例: 但是，当我运行代码时，我的 Python 解释器会报错并输出以下消息: UserWarning: Matpl
python - 如何在 pytest 中断言 UserWarning 和 SystemExit
在 pytest 中断言 UserWarning 和 SystemExit 在我的应用程序中，我有一个函数，当提供错误的参数值时，将从 warnings 模块中引发一个 UserWarnings，然后
python - "UserWarning: Possibly corrupt EXIF data"对图像进行分类时
下面是我的多图像分类代码。我收到错误；我认为这是因为加载和其他地方尺寸不匹配。错误消息从代码结束处开始。有人能看出问题所在吗？ #importing necessary packages impor
Python:UserWarning:此模式具有匹配组。要实际获取组，请使用 str.extract
我有一个数据框，我尝试获取字符串，其中列中包含一些字符串Df 看起来像 member_id,event_path,event_time,event_duration 30595,"2016-03-30

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何解决与输入大小 (torch.Size([1])) 不同的 UserWarning : Using a target size (torch. Size([]))？