machine-learning - TensorFlow 训练不起作用 : model is not learning data-6ren

machine-learning - TensorFlow 训练不起作用 : model is not learning data

转载作者：行者123 更新时间：2023-11-30 08:46:16

我有一个包含超过 1700 万个观测值的数据集，我正在尝试使用它来训练 DNNRegressor 模型。然而，培训根本不起作用。损失约为 10^15，这实在是太可怕了。几周来我一直在尝试不同的事情，无论我做什么，我都无法减少损失。

例如，训练后，我使用用于训练数据的相同观察结果之一进行测试预测。预期结果是 140944.00，但预测结果是 -169532.5，这是荒谬的。训练数据中甚至没有任何负值，我不明白它怎么会这么差。

以下是一些示例训练数据:

Amount      Contribution    ServiceType     Percentile       Time   Result
214871.00   3501.00         SM23            high             50     17807828.00
214871.00   3501.00         SM23            high             51     19216520.00
214871.00   3501.00         SM23            high             52     19676064.00
214871.00   3501.00         SM23            high             53     21038840.00
214871.00   3501.00         SM23            high             54     22248295.00
214871.00   3501.00         SM23            high             55     22412713.00
28006.00    83.00           SM0             i_low            0      28006.00
28006.00    83.00           SM0             i_low            1      28804.00
28006.00    83.00           SM0             i_low            2      30140.00
28006.00    83.00           SM0             i_low            3      31598.00
28006.00    83.00           SM0             i_low            4      33130.00
28006.00    83.00           SM0             i_low            5      34663.00

这是我的代码:

feature_columns = [
    tf.feature_column.numeric_column('Amount', dtype=dtypes.float32),
    tf.feature_column.numeric_column('Contribution', dtype=dtypes.float32),
    tf.feature_column.embedding_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            'ServiceType',
            [
                'SM0',  'SM1',  'SM2',  'SM3',
                'SM4',  'SM5',  'SM6',  'SM7',
                'SM8',  'SM9',  'SM10', 'SM11',
                'SM12', 'SM13', 'SM14', 'SM15',
                'SM16', 'SM17', 'SM18', 'SM19',
                'SM20', 'SM21', 'SM22', 'SM23'
            ],
            dtype=dtypes.string
        ),
        dimension=16
    ),
    tf.feature_column.embedding_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            'Percentile',
            ['i_low', 'low', 'mid', 'high'],
            dtype=dtypes.string
        ),
        dimension=16
    ),
    tf.feature_column.numeric_column('Time', dtype=dtypes.int8)
]

model = tf.estimator.DNNRegressor(
    hidden_units=[64, 32],
    feature_columns=feature_columns,
    model_dir=os.getcwd() + "\job",
    label_dimension=1,
    weight_column=None,
    optimizer='Adagrad',
    activation_fn=tf.nn.elu,
    dropout=None,
    input_layer_partitioner=None,
    config=RunConfig(
        master=None,
        num_cores=4,
        log_device_placement=False,
        gpu_memory_fraction=1,
        tf_random_seed=None,
        save_summary_steps=100,
        save_checkpoints_secs=0,
        save_checkpoints_steps=None,
        keep_checkpoint_max=5,
        keep_checkpoint_every_n_hours=10000,
        log_step_count_steps=100,
        evaluation_master='',
        model_dir=os.getcwd() + "\job",
        session_config=None
    )
)

print('Training...')
model.train(input_fn=get_input_fn('train'), steps=100000)

print('Evaluating...')
model.evaluate(input_fn=get_input_fn('test'), steps=4000)

print('Predicting...')
prediction = model.predict(input_fn=get_input_fn('predict'))

print(list(prediction))

input_fn 计算如下:

def split_input():
    data = pd.read_csv('C:\\all_data.txt', sep='\t')

    x = data.drop('Result', axis=1)
    y = data.Result

    return train_test_split(x, y, test_size=0.2, random_state=123)


def get_input_fn(input_fn_type):
    train_x, test_x, train_y, test_y = split_input()

    if input_fn_type == 'train':
        return tf.estimator.inputs.pandas_input_fn(
            x=train_x,
            y=train_y,
            num_epochs=None,
            shuffle=True
        )
    elif input_fn_type == 'test':
        return tf.estimator.inputs.pandas_input_fn(
            x=test_x,
            y=test_y,
            num_epochs=1,
            shuffle=False
        )
    elif input_fn_type == 'predict':
        return tf.estimator.inputs.pandas_input_fn(
            x=pd.DataFrame(
                {
                    'Amount': 52050.00,
                    'Contribution': 1394.00,
                    'ServiceType': 'SM0',
                    'Percentile': 'i_low',
                    'Time': 5
                },
                index=[0]
            ),
            num_epochs=1,
            shuffle=False
        )

输出如下:

Training...
INFO:tensorflow:loss = 6.30944e+15, step = 1
INFO:tensorflow:global_step/sec: 457.091
INFO:tensorflow:loss = 3.28245e+15, step = 101 (0.219 sec)
INFO:tensorflow:global_step/sec: 533.271
INFO:tensorflow:loss = 2.65647e+15, step = 201 (0.188 sec)
INFO:tensorflow:global_step/sec: 533.274
...
INFO:tensorflow:loss = 1.06601e+15, step = 99701 (0.203 sec)
INFO:tensorflow:global_step/sec: 533.289
INFO:tensorflow:loss = 2.12652e+15, step = 99801 (0.188 sec)
INFO:tensorflow:global_step/sec: 533.273
INFO:tensorflow:loss = 1.31647e+15, step = 99901 (0.203 sec)
INFO:tensorflow:Saving checkpoints for 100000 into C:\projection_model\job\model.ckpt.
INFO:tensorflow:Loss for final step: 2.88956e+15.
Evaluating...
INFO:tensorflow:Evaluation [1/4000]
INFO:tensorflow:Evaluation [2/4000]
INFO:tensorflow:Evaluation [3/4000]
...
INFO:tensorflow:Evaluation [3998/4000]
INFO:tensorflow:Evaluation [3999/4000]
INFO:tensorflow:Evaluation [4000/4000]
INFO:tensorflow:Finished evaluation at 2017-08-30-19:04:03
INFO:tensorflow:Saving dict for global step 100000: average_loss = 1.37941e+13, global_step = 100000, loss = 1.76565e+15
Predicting...
[{'predictions': array([-169532.5], dtype=float32)}] # Should be somewhere around 140944.00

为什么模型不学习数据？我尝试了不同的回归器和输入标准化，但没有任何效果。

最佳答案

tf.contrib.learn.DNNRegressor隐藏了太多细节，如果一切都能立即正常工作，这很好，但当需要一些调试时，这就非常令人沮丧了。

例如，学习率很可能太大。您不会在代码中看到学习率，因为它是由 DNNRegressor 选择的。默认情况下，it's 0.05 ，这对于许多应用程序来说是合理的，但在您的特定情况下可能太大。我建议您自己实例化优化器 AdagradOptimizer(learning_rate) 并将其传递给 DNNRegressor。

也可能是初始权重太大。 DNNRegressor 使用 tf.contrib.layers.fully_connected层而不覆盖weights_initializer 和 biases_initializer。和以前一样，默认值非常合理，但如果您希望它有所不同，您根本无法控制它。

为了检查神经网络是否至少以某种方式工作，我通常所做的就是将训练集减少到几个例子，并尝试过度拟合神经网络。这个实验非常快，所以我可以尝试各种学习率和其他超参数来找到最佳点，然后转向更大的数据集。

进一步故障排除:可视化每层激活的分布、tensorboard中梯度或权重的分布缩小问题范围。

关于machine-learning - TensorFlow 训练不起作用 : model is not learning data，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45968117/

文章推荐： python-3.x - 无法在 Windows 10 中运行 Tensorflow

文章推荐： scala - Spark 加载决策树数据 - 更改 LabelledPoint 中的标签

文章推荐： python-3.x - Keras 中 X 的多项功能

node.js - Mongoose.model vs Connection.model vs Model.model
我对 mongoosejs 中模型的使用感到有些困惑。可以通过这些方式使用 mongoose 创建模型使用 Mongoose var mongoose = require('mongoose');
python - models.py 中的 models.Model 参数到底指的是什么？
我正在看 from django.db import models class Publisher(models.Model): name = models.CharField(max_len
asp.net-mvc-2 - 为什么 model => model.Reason_ID 变成 model =>Convert(model.Reason_ID)
我有自己的 html 帮助器扩展，我用这种方式 model.Reason_ID, Register.PurchaseReason) %> 这样声明的。 public static MvcHtmlS
python - model.to(device) 和 model=model.to(device) 有什么区别？
假设模型原本是存储在CPU上的，然后我想把它移到GPU0上，那么我可以这样做: device = torch.device('cuda:0') model = model.to(device) # o
model-view-controller - MVC : Data Models and View Models
我过去读过一些关于模型的 MVC 建议，指出不应为域和 View 重用相同的模型对象；但我找不到任何人愿意讨论为什么这很糟糕。我认为创建两个单独的模型 - 一个用于域，一个用于 View - 然后在
model - 为什么 model.forward(input) 和 model(input) 之间有不同的输出
我正在使用pytorch构建一个像VGG16这样的简单模型，并且我已经重载了函数forward在我的模型中。我发现每个人都倾向于使用 model(input)得到输出而不是 model.forwar
python - tf.keras.models.model 与 tf.keras.model
tf.keras API 中的 models 是否多余？对于某些情况，即使不使用 models，代码也能正常运行。 keras.models.sequential 和 keras.sequential
docker - 运行docker容器报错: Could not find base path/models/model for servable model
当我尝试使用 docker 镜像运行 docker 容器时遇到问题:tensorflow/serving。我运行命令: docker run --name=tf_serving -it tensor
python - Model.get_model_path(model_name ="model") 抛出错误 : Model not found in cache or in root at
我有一个模型，我用管道注册了它: register_step = PythonScriptStep(name = "Register Model",
model-view-controller - MVC : pass model/model data to a view from a controller?
如果 View 需要访问模型中的数据，您是否认为 Controller 应: a)将模型传递给 View b)将模型的数据传递给 View c)都不；这不应该是 Controller 所关心的。让 V
python - Models.Model 的列表模型字段
我正在寻找一个可以在模型中定义的字段，该字段本质上是一个列表，因为它将用于存储多个字符串值。显然CharField不能使用。最佳答案您正在描述一种多对一的关系。这应该通过一个额外的 Model 进
python - Django + (django-model-utils) : Combining two models/inheriting from two models
我最近了解了 Django 中的模型继承。我使用很棒的包 django-model-utils 取得了巨大的成功。我继承自 TimeStampedModel 和 SoftDeletableModel。
python - 为什么 Keras 在 model.evaluate、model.predicts 和 model.fit 之间给我不同的结果？
我正在使用基于 resnet50 的双输出模型进行项目。一个输出用于回归任务，第二个输出用于分类任务。我的主要问题是关于模型评估。在训练期间，我在验证集的两个输出上都取得了不错的结果: - 综合损失
python - Keras:我可以使用 model.predict 但不使用 model.predict_generator 来预测是否使用 model.fit_generator 训练模型
我是keras的新手。现在，我将使用我使用 model.fit_generator 训练的模型来预测测试图像组。我可以使用 model.predict 吗？不确定如何使用model.predict_g
javascript - 将 Model.ID 绑定(bind)到复选框列表并将 Model.X、Model.Y 等属性发布到 Controller
在 MVC 应用程序中，我加入了多个表并将其从 Controller 返回到 View，如下所示: | EmployeeID | ControlID | DoorAddress | DoorID |
node.js - Cassandra Sails model.count() 有效但 model.find() 和 model.findOne() 无效
我在使用 sails-cassandra 连接系统的 Sails 中有一个 Data 模型。数据。 Data.count({...}).exec() 返回 1，但 Data.find({...}).e
java.lang.IllegalArgumentException : Cannot convert Model. User[ usrId=1 ] 将类 Model.User 键入类 Model.User
我正在使用 PrimeFaces dataTable 开发一个 jsf 页面来显示用户列表。用户存储在 Model.User 类的对象中。
python - Keras错误: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected
我正在关注https://www.tensorflow.org/tutorials/keras/basic_classification解决 Kaggle 挑战。但是，我不明白应该将什么样的数据输入
python - 如何使用 model.pb、model.h5 或 model.json 创建 CNN 的 .config 文件？
我是这个领域的新手。那么，你们能帮忙如何为 CNN 创建 .config 文件吗？传递有关如何执行此操作的文档或教程将对我有很大帮助。谢谢大家。最佳答案这个问题对我来说没有多大意义，因为 .co
modeling - 一致的术语 : Modeling, DAE、ODE
我是“物理系统建模”主题的新手。我阅读了一些基础文献，并在 Modelica 和 Simulink/Simscape 中做了一些教程。我想问你，如果我对以下内容理解正确: 符号操作是将微分代数方程组(

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

machine-learning - TensorFlow 训练不起作用 : model is not learning data