"RuntimeError: You must train on the training inputs! " When I'm trying to use mini batch in training Guassian Process Regression Model(“RuntimeError：你必须在训练输入上训练！which is the most important part of the Process Regression Model.“)-6ren

"RuntimeError: You must train on the training inputs! " When I'm trying to use mini batch in training Guassian Process Regression Model(“RuntimeError：你必须在训练输入上训练！which is the most important part of the Process Regression Model.“)

转载作者：bug小助手更新时间：2023-10-28 13:17:53

I have written a piece of code to train a Guassian Process Regression Model to predicting age. I've written the following code and it's running well:

我已经编写了一段代码来训练Guassian过程回归模型来预测年龄。我已经编写了以下代码，并且运行良好：

import numpy as np
import pandas as pd
import h5py
import torch
import gpytorch
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
import time


disease_mapping = {
    'control': 0,
    "Alzheimer's disease": 1,
    "Graves' disease": 2,
    "Huntington's disease": 3,
    "Parkinson's disease": 4,
    'rheumatoid arthritis': 5,
    'schizophrenia': 6,
    "Sjogren's syndrome": 7,
    'stroke': 8,
    'type 2 diabetes': 9
}
sample_type_mapping = {'control': 0, 'disease tissue': 1}


def load_idmap(idmap_dir):
    idmap = pd.read_csv(idmap_dir, sep=',')
    age = idmap.age.to_numpy()
    age = age.astype(np.float32)
    sample_type = idmap.sample_type.replace(sample_type_mapping)
    return age, sample_type


def load_methylation_h5(prefix):
    '''
    Load methylation data from .h5 file.

    Parameters:
    ------------
    prefix: 'train' or 'test'
    '''
    methylation = h5py.File('encoded_'+prefix + 'data.h5', 'r')['data']
    h5py.File('encoded_'+prefix + 'data.h5', 'r').close()
    #return methylation[:, :10000]  # 5000 just for test
    return methylation[:, :]  # If you want to use full data, you can use this line.


def evaluate_ml(y_true, y_pred, sample_type):
    '''
    This function is used to evaluate the performance of the model.

    Parameters:
    ------------
    y_true: true age
    y_pred: predicted age
    sample_type: sample type, 0 for control, 1 for case

    Return:
    ------------
    mae: mean absolute error.
    mae_control: mean absolute error of control samples.
    mae_case: mean absolute error of case samples.

    We use MAE to evaluate the performance.
    Please refer to evaluation section in the the official website for more details.
    '''
    mae_control = np.mean(
        np.abs(y_true[sample_type == 0] - y_pred[sample_type == 0]))

    case_true = y_true[sample_type == 1]
    case_pred = y_pred[sample_type == 1]
    above = np.where(case_pred >= case_true)
    below = np.where(case_pred < case_true)

    ae_above = np.sum(np.abs(case_true[above] - case_pred[above])) / 2
    ae_below = np.sum(np.abs(case_true[below] - case_pred[below]))
    mae_case = (ae_above + ae_below) / len(case_true)

    mae = np.mean([mae_control, mae_case])
    return mae, mae_control, mae_case

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
methylation = load_methylation_h5('train')
methylation_test = load_methylation_h5('test')

idmap_train_dir = 'trainmap.csv'
idmap_test_dir = 'testmap.csv'

age, sample_type = load_idmap(idmap_train_dir)

print('Load data done')

#测试集训练集划分及预处理
indices = np.arange(len(age))
[indices_train, indices_valid, age_train,
 age_valid] = train_test_split(indices, age, test_size=0.2, shuffle=True)
methylation_train, methylation_valid = methylation[
                                           indices_train], methylation[indices_valid]
sample_type_train, sample_type_valid = sample_type[
                                           indices_train], sample_type[indices_valid]
feature_size = methylation_train.shape[1]
del methylation

# 将数据转换为torch张量
train_x = torch.tensor(methylation_train, dtype=torch.float32).to(device)
train_y = torch.tensor(age_train, dtype=torch.float32).to(device)
test_x = torch.tensor(methylation_valid, dtype=torch.float32).to(device)
test_y = torch.tensor(age_valid, dtype=torch.float32).to(device)
#dataset = torch.utils.data.TensorDataset(train_x, train_y)
#data_loader = torch.utils.data.DataLoader(dataset, batch_size=128, shuffle=True)
# 定义高斯过程模型
class GPRegressionModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood().to(device)
model = GPRegressionModel(train_x, train_y, likelihood).to(device)#先验

# 准备训练
model.train()
likelihood.train()

# 使用Adam优化器
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)#lr for learning rate
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model).to(device)

#开始训练
##设置参数
num_epochs = 2
target_loss = 0.5
print('Start training...')
for epoch in range(num_epochs):
    start = time.time()
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    optimizer.step()
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}', f'Training time: {time.time() - start}s')
    # 检查是否达到目标损失值
    if loss.item() <= target_loss:
        print(f"Terminating training at iteration {epoch} as target loss {target_loss} is achieved.")
        break
# 切换到评估模式
model.eval()
likelihood.eval()

# 进行预测
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(test_x))
    age_valid_pred = observed_pred.mean
#print(age_valid_pred)
age_valid_pred = age_valid_pred.cpu().numpy()
#print(age_valid_pred)
mae = evaluate_ml(age_valid, age_valid_pred, sample_type_valid)
print(f'Validation MAE: {mae}')
#预测
pred_x = torch.tensor(methylation_test, dtype=torch.float32).to(device)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(pred_x))
    age_pred = observed_pred.mean
age_pred = age_pred.cpu().numpy()
age_pred[age_pred < 0] = 0
# naive post-processing to ensure age >= 0

age_pred = np.around(age_pred, decimals=2)
age_pred = ['%.2f' % i for i in age_pred]
sample_id = pd.read_csv(idmap_test_dir, sep=',').sample_id
# Note: sample_id in submission should be the same as the order in testmap.csv.
# We do not provide the matching producdure for disordered sample_id in evaluation.

#submission = pd.DataFrame({'sample_id': sample_id, 'age': age_pred})
#submission_file = 'submit7.txt'
#submission.to_csv(submission_file, index=False)

but i have noticed that in each epoches, the same data was input, which i think may cause over fitting, so i want to use mini batch to train the model. I edit my code, as follow.

但我注意到，每个纪元都输入了相同的数据，我认为这可能会导致过度拟合，所以我想使用Mini Batch来训练模型。我编辑代码，如下所示。

import numpy as np
import pandas as pd
import h5py
import torch
import gpytorch
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
import time


disease_mapping = {
    'control': 0,
    "Alzheimer's disease": 1,
    "Graves' disease": 2,
    "Huntington's disease": 3,
    "Parkinson's disease": 4,
    'rheumatoid arthritis': 5,
    'schizophrenia': 6,
    "Sjogren's syndrome": 7,
    'stroke': 8,
    'type 2 diabetes': 9
}
sample_type_mapping = {'control': 0, 'disease tissue': 1}


def load_idmap(idmap_dir):
    idmap = pd.read_csv(idmap_dir, sep=',')
    age = idmap.age.to_numpy()
    age = age.astype(np.float32)
    sample_type = idmap.sample_type.replace(sample_type_mapping)
    return age, sample_type


def load_methylation_h5(prefix):
    '''
    Load methylation data from .h5 file.

    Parameters:
    ------------
    prefix: 'train' or 'test'
    '''
    methylation = h5py.File('encoded_'+prefix + 'data.h5', 'r')['data']
    h5py.File('encoded_'+prefix + 'data.h5', 'r').close()
    #return methylation[:, :10000]  # 5000 just for test
    return methylation[:, :]  # If you want to use full data, you can use this line.


def evaluate_ml(y_true, y_pred, sample_type):
    '''
    This function is used to evaluate the performance of the model.

    Parameters:
    ------------
    y_true: true age
    y_pred: predicted age
    sample_type: sample type, 0 for control, 1 for case

    Return:
    ------------
    mae: mean absolute error.
    mae_control: mean absolute error of control samples.
    mae_case: mean absolute error of case samples.

    We use MAE to evaluate the performance.
    Please refer to evaluation section in the the official website for more details.
    '''
    mae_control = np.mean(
        np.abs(y_true[sample_type == 0] - y_pred[sample_type == 0]))

    case_true = y_true[sample_type == 1]
    case_pred = y_pred[sample_type == 1]
    above = np.where(case_pred >= case_true)
    below = np.where(case_pred < case_true)

    ae_above = np.sum(np.abs(case_true[above] - case_pred[above])) / 2
    ae_below = np.sum(np.abs(case_true[below] - case_pred[below]))
    mae_case = (ae_above + ae_below) / len(case_true)

    mae = np.mean([mae_control, mae_case])
    return mae, mae_control, mae_case

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
methylation = load_methylation_h5('train')
methylation_test = load_methylation_h5('test')

idmap_train_dir = 'trainmap.csv'
idmap_test_dir = 'testmap.csv'

age, sample_type = load_idmap(idmap_train_dir)

print('Load data done')

#测试集训练集划分及预处理
indices = np.arange(len(age))
[indices_train, indices_valid, age_train,
 age_valid] = train_test_split(indices, age, test_size=0.2, shuffle=True)
methylation_train, methylation_valid = methylation[
                                           indices_train], methylation[indices_valid]
sample_type_train, sample_type_valid = sample_type[
                                           indices_train], sample_type[indices_valid]
feature_size = methylation_train.shape[1]
del methylation

# 将数据转换为torch张量
train_x = torch.tensor(methylation_train, dtype=torch.float32).to(device)
train_y = torch.tensor(age_train, dtype=torch.float32).to(device)
test_x = torch.tensor(methylation_valid, dtype=torch.float32).to(device)
test_y = torch.tensor(age_valid, dtype=torch.float32).to(device)
#dataset = torch.utils.data.TensorDataset(train_x, train_y)
#data_loader = torch.utils.data.DataLoader(dataset, batch_size=128, shuffle=True)
# 定义高斯过程模型
class GPRegressionModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood().to(device)
model = GPRegressionModel(train_x, train_y, likelihood).to(device)#先验

# 准备训练
model.train()
likelihood.train()

# 使用Adam优化器
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)#lr for learning rate
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model).to(device)

#开始训练
##设置参数
num_epochs = 2
target_loss = 0.5
print('Start training...')
for epoch in range(num_epochs):
    start = time.time()
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    optimizer.step()
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}', f'Training time: {time.time() - start}s')
    # 检查是否达到目标损失值
    if loss.item() <= target_loss:
        print(f"Terminating training at iteration {epoch} as target loss {target_loss} is achieved.")
        break
# 切换到评估模式
model.eval()
likelihood.eval()

# 进行预测
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(test_x))
    age_valid_pred = observed_pred.mean
#print(age_valid_pred)
age_valid_pred = age_valid_pred.cpu().numpy()
#print(age_valid_pred)
mae = evaluate_ml(age_valid, age_valid_pred, sample_type_valid)
print(f'Validation MAE: {mae}')
#预测
pred_x = torch.tensor(methylation_test, dtype=torch.float32).to(device)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(pred_x))
    age_pred = observed_pred.mean
age_pred = age_pred.cpu().numpy()
age_pred[age_pred < 0] = 0
# naive post-processing to ensure age >= 0

age_pred = np.around(age_pred, decimals=2)
age_pred = ['%.2f' % i for i in age_pred]
sample_id = pd.read_csv(idmap_test_dir, sep=',').sample_id
# Note: sample_id in submission should be the same as the order in testmap.csv.
# We do not provide the matching producdure for disordered sample_id in evaluation.

#submission = pd.DataFrame({'sample_id': sample_id, 'age': age_pred})
#submission_file = 'submit7.txt'
#submission.to_csv(submission_file, index=False)

this time the programme return an error:"RuntimeError: You must train on the training inputs! ", traced back to line 147 "output = model(train_x)".

这一次，程序返回一个错误：“运行错误：您必须对训练输入进行训练！”，追溯到第147行“OUTPUT=MODEL(TRAIN_X)”。

Is this method unable to use mini batch, or just somewhere wrong in my code?

是这个方法不能使用迷你批处理，还是我的代码中有什么地方出错了？

更多回答

优秀答案推荐

更多回答

python - RuntimeError(f"目录 '{directory}' 不存在") RuntimeError : Directory 'app/static' does not exist
运行 server.py 文件时出现错误 File "C:\Users\nawin\AppData\Local\Programs\Python\Python38\lib\site-packages\s
python - tensorflow cifar10_eval.py 错误 :RuntimeError: Attempted to use a closed Session. RuntimeError: Attempted to use a closed Session
我在我的 PC 上运行 cifar10 网络，在完成训练和运行评估脚本后出现以下错误: 2016-06-01 14:37:14.238317: precision @ 1 = 0.000 Traceb
"RuntimeError: You must train on the training inputs! " When I'm trying to use mini batch in training Guassian Process Regression Model(“RuntimeError：你必须在训练输入上训练！which is the most important part of the Process Regression Model.“)
我已经编写了一段代码来训练Guassian过程回归模型来预测年龄。我已经编写了以下代码，并且运行良好：。但我注意到，每个纪元都输入了相同的数据，我认为这可能会导致过度拟合，所以我想使用Mini Bat
python - 一种热编码期间的 RunTimeError
我有一个数据集，其中类值从 -2 到 2 步 (i.e., -2,-1,0,1,2)其中 9 标识未标记的数据。使用一种热编码 self._one_hot_encode(labels) 我收到以下错
elixir - (RuntimeError) 预期连接有响应
我是 Phoenix Framework 的新用户，我正在尝试设置一个简单的 HTTP POST 服务，该服务对传入数据执行计算并返回结果，但出现以下错误: ** (RuntimeError) exp
python - 列表理解引发 RuntimeError
为什么这段代码运行良好并且不抛出异常？ def myzip(*args): iters = [iter(arg) for arg in args] try: while
python - 每次加载网页时出现 RunTimeError
今天，当我开始编写我的网页时，它工作正常，但突然我的 css 文件无法工作。我的更改没有更新。读了一点之后，我读到我应该清理我的缓存。我这样做了，之后当我运行我的网页并单击任何按钮时，我会在我的控制台
python - 当用户尝试更改对象的值时使用描述符类引发 RuntimeError
我使用描述符编写了一个 Circle 类，允许用户设置圆的 x、y 和 r 的值，并检查 x 和 y 的值是否为整数。如果用户输入非整数，则会引发 TypeError，现在我想制作另一个描述符类，允许
python - 如何解释在信号处理程序中打印导致的可重入 RuntimeError？
代码: # callee.py import signal import sys import time def int_handler(*args): for i in range(10):
Python:RuntimeError 是否可以接受一般用途？
将 RuntimeError 异常用于一般应用程序是否可以接受？ raise RuntimeError('config file is missing host address') 我有一些代码会遇到
Pytorch测试神经网络时出现 RuntimeError:的解决方案
Pytorch测试神经网络时出现“RuntimeError: Error(s) in loading state_dict for Net” 解决方法： ?
python - RuntimeError:OrderedDict在迭代过程中发生了突变(Python3)
得到标题中提到的错误。下面提到的函数由通过POST api调用的另一个函数调用。打印语句下方的行上有错误。不知道该错误意味着什么，为什么会出现。一周前使用的相同代码。 def remove_indi
python - PyTorch 向后函数发生 RuntimeError
我正在尝试计算 PyTorch 中变量的梯度。然而，有一个运行时错误告诉我输出和梯度的形状必须相同。然而，就我而言，输出和梯度的形状不能相同。这是我要重现的代码: import numpy as np
android - BaseExpandableListAdapter RuntimeError NullPointerException
我正在尝试在 ExpandableListView 中查看数据库中的数据(我首先尝试让它使用硬编码字符串)。我使用了以下示例:CodeWiki ExpandableListView 但是当我点击一个
python - '无法在另一个循环运行时运行事件循环')RuntimeError websockets？
import asyncio import json import websockets from mongodb import * class WebSocketRe
python - Matplotlib 和多处理 RuntimeError
我正在尝试同时使用多处理和 matplotlib。我正在创建一个标准的 Pool，添加与 apply_async 的工作，并使用 apply_async 的回调函数更新 GUI，它运行于Pool 的
python - StopIteration 何时会转换为 RuntimeError？
我正在阅读 Python 3 的文档 here : If a generator code directly or indirectly raises StopIteration, it is con
python - BeautifulSoup:RuntimeError:超出最大递归深度
我无法使用 BeautifulSoup 避免最大递归深度 Python RuntimeError。我正在尝试递归嵌套的代码部分并提取内容。美化后的 HTML 看起来像这样(不要问为什么它看起来像这样
Python:gensim:RuntimeError:在训练模型之前必须先建立词汇表
我知道已经有人问过这个问题，但我仍然无法找到解决方案。我想在自定义数据集上使用 gensim 的 word2vec，但现在我仍在弄清楚数据集必须采用什么格式。我看了this post其中输入基本上是
memory - RuntimeError : CUDA out of memory. 如何设置max_split_size_mb？
我在 Colab Pro+(使用高 RAM 选项)上运行神经网络时发现了这个问题。运行时错误:CUDA 内存不足。尝试分配 8.00 GiB(GPU 0；15.90 GiB 总容量；12.04 Gi

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

"RuntimeError: You must train on the training inputs! " When I'm trying to use mini batch in training Guassian Process Regression Model(“RuntimeError：你必须在训练输入上训练！which is the most important part of the Process Regression Model.“)