gpt4 book ai didi

python - Pytorch:如何从 2D 矢量/图像预测 1D 矢量?

转载 作者:太空宇宙 更新时间:2023-11-03 19:55:25 29 4
gpt4 key购买 nike

我正在尝试使用 Pytorch 通过 2D 向量(嘈杂语音帧序列)的回归来预测 1D 向量(干净语音数据帧) data) - 之前已经完成过。帧序列为帧提供时间上下文,以更准确地预测干净帧。这些向量可以被认为类似于 2D 灰度图像和 1D 灰度图像。

当批量大小为 64、窗口长度为 5、帧长度为 257 时,输入张量的形状为 [64, 1, 5, 257],目标张量的形状为 [64, 1, 1, 257]。

有一些在 TensorFlow 中完成此操作的示例,但我找不到使用 Pytorch 的示例。这是迄今为止我复制这篇论文的最佳尝试(https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1465.PDF)。

def __init__(self, window_length, frame_length, batch_size):
super(Net, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
# nn.BatchNorm2d(12),
nn.ReLU())
self.layer2 = nn.Sequential(
nn.Conv2d(12, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
# nn.BatchNorm2d(16),
nn.ReLU())
self.layer3 = nn.Sequential(
nn.Conv2d(16, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
# nn.BatchNorm2d(20),
nn.ReLU())
self.layer4 = nn.Sequential(
nn.Conv2d(20, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
# nn.BatchNorm2d(24),
nn.ReLU())
self.layer5 = nn.Sequential(
nn.Conv2d(24, 32, kernel_size=(1,7), stride=1, padding=(0,3)),
# nn.BatchNorm2d(32),
nn.ReLU())
self.layer6 = nn.Sequential(
nn.Conv2d(32, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
# nn.BatchNorm2d(24),
nn.ReLU())
self.layer7 = nn.Sequential(
nn.Conv2d(24, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
# nn.BatchNorm2d(20),
nn.ReLU())
self.layer8 = nn.Sequential(
nn.Conv2d(20, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
# nn.BatchNorm2d(16),
nn.ReLU())
self.layer9 = nn.Sequential(
nn.Conv2d(16, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
# nn.BatchNorm2d(12),
nn.ReLU())
self.conv_out = nn.Sequential(
nn.Conv2d(12, 1, kernel_size=(1,1), stride=1, padding=(0,0)),
)
self.fc1 = nn.Linear(batch_size * window_length * frame_length, frame_length)

def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = self.layer5(out)
out = self.layer6(out)
out = self.layer7(out)
out = self.layer8(out)
out = self.layer9(out)
out = self.conv_out(out)
out = self.fc1(out)
return out

在此网络上调用 .forward() 会导致以下错误消息:

运行时错误:大小不匹配,m1:[320 x 257],m2:[82240 x 257]

如何将每个样本的输出层减少到 1x257 以匹配目标(长度为 257 的单帧)?

最佳答案

这是您的代码的工作示例。

class Net(nn.Module):
def __init__(self, window_length, frame_length, batch_size):
super(Net, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
# nn.BatchNorm2d(12),
nn.ReLU())
self.layer2 = nn.Sequential(
nn.Conv2d(12, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
# nn.BatchNorm2d(16),
nn.ReLU())
self.layer3 = nn.Sequential(
nn.Conv2d(16, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
# nn.BatchNorm2d(20),
nn.ReLU())
self.layer4 = nn.Sequential(
nn.Conv2d(20, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
# nn.BatchNorm2d(24),
nn.ReLU())
self.layer5 = nn.Sequential(
nn.Conv2d(24, 32, kernel_size=(1,7), stride=1, padding=(0,3)),
# nn.BatchNorm2d(32),
nn.ReLU())
self.layer6 = nn.Sequential(
nn.Conv2d(32, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
# nn.BatchNorm2d(24),
nn.ReLU())
self.layer7 = nn.Sequential(
nn.Conv2d(24, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
# nn.BatchNorm2d(20),
nn.ReLU())
self.layer8 = nn.Sequential(
nn.Conv2d(20, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
# nn.BatchNorm2d(16),
nn.ReLU())
self.layer9 = nn.Sequential(
nn.Conv2d(16, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
# nn.BatchNorm2d(12),
nn.ReLU())
self.conv_out = nn.Sequential(
nn.Conv2d(12, 1, kernel_size=(1,1), stride=1, padding=(0,0)),
)
self.fc1 = nn.Linear(window_length * frame_length, frame_length)

def forward(self, x):

out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = self.layer5(out)
out = self.layer6(out)
out = self.layer7(out)
out = self.layer8(out)
out = self.layer9(out)
out = self.conv_out(out)
out = out.view(-1, 5*257)
out = self.fc1(out)
return out

我所做的更改:

  • 更改了线性层的定义。在 Pytorch 中,您不会在模型公式中明确使用批量大小。您的模型公式应该能够处理批量大小的任何值。此外,基于batch_size定义线性层是错误的。线性层仅定义输入的真实形状。

  • 使用 View 在卷积层之后 reshape 输出

我引用了以下 tensorflow 代码RCED做出这些改变。尽管我确实发现它与论文描述有点不同,论文描述说它使用完全卷积层。但我认为,它们几乎都很相似。

关于python - Pytorch:如何从 2D 矢量/图像预测 1D 矢量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59569971/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com