gpt4 book ai didi

python - 奇数的最大池化可能吗?

转载 作者:行者123 更新时间:2023-12-03 23:47:09 26 4
gpt4 key购买 nike

我正在学习 Udacity DeepLearning Nanodegree 并致力于自动编码器迷你项目。我不明白解决方案,也不明白如何自己检查。所以这是2个问题。

我们从 28*28 的图像开始。它们通过 3 个卷积层馈送,每个卷积层的填充为 1,每个卷积层的最大池化为原始尺寸的一半。我不明白的是最后一个元素?当然,2 轮 maxpooling (28/2)/2 给出 7,因此进一步的 maxpooling 应该是不可能的,因为它会导致奇数。有人可以向我解释为什么会这样吗?要复制的代码在这里:
'''

import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# load the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=True, transform=transform)

# Create training and test dataloaders
num_workers = 0
# how many samples per batch to load
batch_size = 20

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

import torch.nn as nn
import torch.nn.functional as F

# define the NN architecture
class ConvDenoiser(nn.Module):
def __init__(self):
super(ConvDenoiser, self).__init__()
## encoder layers ##
# conv layer (depth from 1 --> 32), 3x3 kernels
self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
# conv layer (depth from 32 --> 16), 3x3 kernels
self.conv2 = nn.Conv2d(32, 16, 3, padding=1)
# conv layer (depth from 16 --> 8), 3x3 kernels
self.conv3 = nn.Conv2d(16, 8, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)

## decoder layers ##
# transpose layer, a kernel of 2 and a stride of 2 will increase the spatial dims by 2
self.t_conv1 = nn.ConvTranspose2d(8, 8, 3, stride=2) # kernel_size=3 to get to a 7x7 image output
# two more transpose layers with a kernel of 2
self.t_conv2 = nn.ConvTranspose2d(8, 16, 2, stride=2)
self.t_conv3 = nn.ConvTranspose2d(16, 32, 2, stride=2)
# one, final, normal conv layer to decrease the depth
self.conv_out = nn.Conv2d(32, 1, 3, padding=1)


def forward(self, x):
## encode ##
# add hidden layers with relu activation function
# and maxpooling after
x = F.relu(self.conv1(x))
x = self.pool(x)
# add second hidden layer
x = F.relu(self.conv2(x))
x = self.pool(x)
# add third hidden layer
x = F.relu(self.conv3(x))
x = self.pool(x) # compressed representation

## decode ##
# add transpose conv layers, with relu activation function
x = F.relu(self.t_conv1(x))
x = F.relu(self.t_conv2(x))
x = F.relu(self.t_conv3(x))
# transpose again, output should have a sigmoid applied
x = F.sigmoid(self.conv_out(x))

return x

# initialize the NN
model = ConvDenoiser()
print(model)

我想尝试通过手动将单个图像传递给图层来理解这一点,并查看结果是什么,但这导致了错误。有人可以向我解释我如何看到穿过层的形状吗?代码有点乱,但我把它留在那里,所以你可以看到我的尝试。
dataiter = iter(train_loader)
images, labels = dataiter.next()
# images = images.numpy()

# get one image from the batch
# img = np.squeeze(images[0])
img=images[0]

#create hidden layer
conv1 = nn.Conv2d(1, 32, 3, padding=1)

# z=torch.from_numpy(images[0])
z1=conv1(img)

感谢您能给我的任何见解。
谢谢,
J

最佳答案

关于你的第一个问题:
您可以在 documentation 中阅读如何计算最大池化的输出形状。您可以使用带或不带填充的偶数步最大池化奇形张量。您需要注意可能丢失某些像素的边界。

关于你的第二个问题:
您的模型需要 4D 输入:batch-channel-height-width。
通过仅从批次 ( img=images[0] ) 中选择一张图像,您可以消除批次维度,最终只有一个 3D 张量。
要解决此问题:

img=images[0:1, ...]  # select first image, but leave batch dimension as a singleton

关于python - 奇数的最大池化可能吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61983630/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com