gpt4 book ai didi

python - 对于无限数据集,每个时期中使用的数据是否相同?

转载 作者:行者123 更新时间:2023-12-01 02:37:13 24 4
gpt4 key购买 nike

在tensorflow中,假设我有一个来自generator的数据集:

dataset = tf.data.Dataset.from_generator(gen...)


并且此生成器生成无限的非重复数据(就像无限的非循环小数)一样。

model.fit(dataset, steps_per_epoch=10000, epochs=5)


现在,在这5个训练时期内,使用的数据是否相同?即始终是生成器的前10000个项目?而不是第1阶段的0-9999,第2阶段的10000-19999等。

initial_epoch参数呢?如果将其设置为1,将从第10000个项目开始训练模型吗?

model.fit(dataset, steps_per_epoch=10000, epochs=5, initial_epoch=1)


更新:
这个简单的测试表明,每次调用 model.fit()都会重置数据集

def gen():
i = 1
while True:
yield np.array([[i]]), np.array([[0]])
i += 1

ds = tf.data.Dataset.from_generator(gen, output_types=(tf.int32, tf.int32)).batch(3)

x = Input(shape=(1, 1))
model = Model(inputs=x, outputs=x)

model.compile('adam', loss=lambda true, pred: tf.reduce_mean(pred))
for i in range(10):
model.fit(ds, steps_per_epoch=5, epochs=1)


输出:

1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 9ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000


1个通话中有5个纪元:

model.fit(ds, steps_per_epoch=5, epochs=5)


输出:

Epoch 1/5
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 9ms/step - loss: 8.0000
Epoch 2/5
1/5 [=====>........................] - ETA: 0s - loss: 17.0000
5/5 [==============================] - 0s 2ms/step - loss: 23.0000
Epoch 3/5
1/5 [=====>........................] - ETA: 0s - loss: 32.0000
5/5 [==============================] - 0s 2ms/step - loss: 38.0000
Epoch 4/5
1/5 [=====>........................] - ETA: 0s - loss: 47.0000
5/5 [==============================] - 0s 2ms/step - loss: 53.0000
Epoch 5/5
1/5 [=====>........................] - ETA: 0s - loss: 62.0000
5/5 [==============================] - 0s 2ms/step - loss: 68.0000

最佳答案

不,使用的数据不同。 steps_per_epochkeras用于确定每个epoch的长度(由于生成器没有长度),因此它知道何时结束训练(或调用检查点指针等)。

initial_epoch是显示给纪元的数字,当您要从检查点重新开始训练时很有用(请参阅fit method),它与数据迭代无关。

如果将相同的dataset传递给model.fit方法,它将在每次函数调用后重置(感谢信息OP)。

关于python - 对于无限数据集,每个时期中使用的数据是否相同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57535526/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com