python - tf.train.shuffle_batch 和 `tf.train.batch 发生了什么？-6ren

python - tf.train.shuffle_batch 和 `tf.train.batch 发生了什么？

转载作者：太空狗更新时间：2023-10-29 17:43:51

我使用 Binary data训练 DNN。

但是 tf.train.shuffle_batch 和 tf.train.batch 让我很困惑。

这是我的代码，我将对其进行一些测试。

首先Using_Queues_Lib.py:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 100
REAL32_BYTES=4


def read_dataset(filename_queue,data_length,label_length):
  class Record(object):
    pass
  result = Record()

  result_data  = data_length*REAL32_BYTES
  result_label = label_length*REAL32_BYTES
  record_bytes = result_data + result_label

  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
  result.key, value = reader.read(filename_queue)

  record_bytes = tf.decode_raw(value, tf.float32)
  result.data  = tf.strided_slice(record_bytes, [0],[data_length])#record_bytes: tf.float list
  result.label = tf.strided_slice(record_bytes, [data_length],[data_length+label_length])
  return result


def _generate_data_and_label_batch(data, label, min_queue_examples,batch_size, shuffle):
  num_preprocess_threads = 16   #only speed code
  if shuffle:
    data_batch, label_batch = tf.train.shuffle_batch([data, label],batch_size=batch_size,num_threads=num_preprocess_threads,capacity=min_queue_examples + batch_size,min_after_dequeue=min_queue_examples)
  else:
    data_batch, label_batch = tf.train.batch([data, label],batch_size=batch_size,num_threads=num_preprocess_threads,capacity=min_queue_examples + batch_size)
  return data_batch, label_batch

def inputs(data_dir, batch_size,data_length,label_length):
  filenames = [os.path.join(data_dir, 'test_data_SE.dat')]
  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)

  filename_queue = tf.train.string_input_producer(filenames)

  read_input = read_dataset(filename_queue,data_length,label_length)

  read_input.data.set_shape([data_length])   #important
  read_input.label.set_shape([label_length]) #important


  min_fraction_of_examples_in_queue = 0.4
  min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                       min_fraction_of_examples_in_queue)
  print ('Filling queue with %d samples before starting to train. '
     'This will take a few minutes.' % min_queue_examples)

  return _generate_data_and_label_batch(read_input.data, read_input.label,
                                     min_queue_examples, batch_size,
                                     shuffle=True)

第二个Using_Queues.py:

import Using_Queues_Lib
import tensorflow as tf
import numpy as np
import time


max_steps=10
batch_size=100
data_dir=r'.'
data_length=2
label_length=1

#-----------Save paras-----------
import struct
def WriteArrayFloat(file,data):
  fout=open(file,'wb')        
  fout.write(struct.pack('<'+str(data.flatten().size)+'f',
                                *data.flatten().tolist()))
  fout.close()
#-----------------------------

def add_layer(inputs, in_size, out_size, activation_function=None):
  Weights = tf.Variable(tf.truncated_normal([in_size, out_size]))
  biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
  Wx_plus_b = tf.matmul(inputs, Weights) + biases
  if activation_function is None:
    outputs = Wx_plus_b
  else:
    outputs = activation_function(Wx_plus_b)
  return outputs

data_train,labels_train=Using_Queues_Lib.inputs(data_dir=data_dir,
                      batch_size=batch_size,data_length=data_length,
                                          label_length=label_length)

xs=tf.placeholder(tf.float32,[None,data_length])
ys=tf.placeholder(tf.float32,[None,label_length])

l1 = add_layer(xs, data_length, 5, activation_function=tf.nn.sigmoid)
l2 = add_layer(l1, 5, 5, activation_function=tf.nn.sigmoid)
prediction = add_layer(l2, 5, label_length, activation_function=None)

loss = tf.reduce_mean(tf.square(ys - prediction))
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

sess=tf.InteractiveSession()
tf.global_variables_initializer().run()

tf.train.start_queue_runners()

for i in range(max_steps):
  start_time=time.time()
  data_batch,label_batch=sess.run([data_train,labels_train])
  sess.run(train_step, feed_dict={xs: data_batch, ys: label_batch})
  duration=time.time()-start_time
  if i % 1 == 0:
    example_per_sec=batch_size/duration
    sec_pec_batch=float(duration)
    WriteArrayFloat(r'./data/'+str(i)+'.bin',
        np.concatenate((data_batch,label_batch),axis=1))
    format_str=('step %d,loss=%.8f(%.1f example/sec;%.3f sec/batch)')
    loss_value=sess.run(loss, feed_dict={xs: data_batch, ys: label_batch})
    print(format_str%(i,loss_value,example_per_sec,sec_pec_batch))

here中的数据. 它由 Mathematica 生成。

data = Flatten@Table[{x, y, x*y}, {x, -1, 1, .05}, {y, -1, 1, .05}];
BinaryWrite[file, mydata, "Real32", ByteOrdering -> -1];
Close[file];

数据长度:1681

数据如下所示:

绘制数据:红色到绿色颜色表示它们在 here 中出现的时间

运行 Using_Queues.py，它将产生十个批处理，我在这张图中绘制每个 bach:(batch_size=100 和 min_queue_examples=40)

如果 batch_size=1024 和 min_queue_examples=40:

如果 batch_size=100 和 min_queue_examples=4000:

如果 batch_size=1024 和 min_queue_examples=4000:

即使 batch_size=1681 和 min_queue_examples=4000:

该区域没有充满点。

为什么？

那么，为什么要改变 min_queue_examples 来增加随机性呢？如何确定min_queue_examples的值？

tf.train.shuffle_batch 发生了什么？

最佳答案

tf.train.shuffle_batch() 的采样函数(因此 tf.RandomShuffleQueue )使用有点微妙。实现使用 tf.RandomShuffleQueue.dequeue_many(batch_size) ，其(简化)实现如下:

当出队的元素数量小于batch_size时:
- 等到队列至少包含 min_after_dequeue + 1 个元素。
- 从队列中随机均匀选择一个元素，将其从队列中移除，加入输出批处理。

另一件需要注意的事情是如何将元素添加到队列中，它使用运行 tf.RandomShuffleQueue.enqueue() 的后台线程。在同一个队列上:

等到队列的当前大小小于它的容量。
将元素添加到队列中。

因此，队列的capacity 和min_after_dequeue 属性(加上入队的输入数据的分布)决定了tf. train.shuffle_batch() 将采样。输入文件中的数据似乎是有序的，因此您完全依赖于 tf.train.shuffle_batch() 函数来实现随机性。

依次进行可视化:

如果 capacity 和 min_after_dequeue 相对于数据集较小，“洗牌”将从类似于“滑动窗口”的小群体中选择随机元素数据集。您会在出列的批处理中看到旧元素的概率很小。
如果相对于数据集，batch_size 较大而 min_after_dequeue 较小，则“洗牌”将再次从跨数据集的小“滑动窗口”中进行选择数据集。
如果 min_after_dequeue 相对于 batch_size 和数据集的大小而言较大，您将看到(近似)来自每批数据的均匀样本。
如果 min_after_dequeue 和 batch_size 相对于数据集的大小而言较大，您将看到(近似)来自每批数据的均匀样本。
在min_after_dequeue为4000，batch_size为1681的情况下，注意队列中每个元素采样时期望的副本数是4000/1681 = 2.38，因此更有可能对某些元素进行多次采样(并且不太可能对每个唯一元素仅采样一次)。

关于python - tf.train.shuffle_batch 和 `tf.train.batch 发生了什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43028683/

文章推荐： angular - 如何禁用 Protractor 中的动画？

文章推荐： c# - 如何在 FormView 中进入编辑模式？

文章推荐： angular - 在 Angular 2 中实现 AWS-Cognito

文章推荐： c# - 运行时的 TableLayoutPanel 列宽 : strange behavior or bug

spring-batch - Spring Batch 应用程序与 Spring Batch Admin 集成
我开发了一个 spring batch 应用程序，它使用批处理/shell 脚本部署为可执行 jar。它工作正常。最近我读到有关 spring batch admin 应用程序发布的信息。根据他们的
batch-file - 等待输入 BATCH
我想要的是一个 bat 文件来等待一定时间的输入。如果没有输入，我希望它转到 somethingidk。这是我目前所拥有的。 @echo off :START cls timeout 10 set
batch-file - Batch 无法正确计算方程
我最近尝试在不使用外部命令或工具的情况下批量编写一个程序来计算任何实数(而不是负数)的平方根，该程序基于可以在这里找到的算法:Link1 编辑:我修复了大部分问题，但仍然有一个我没有发现的轻微问题。
batch-file - Batch For循环排除包含以下内容的文件名
我有一个简单的批处理文件，它将遍历所有* Test.htm文件并进行复制。一些文件将包含我不想复制的字符串。 FOR /R "C:\" %%g IN (*Test.htm) DO ( echo %%
batch-file - Batch for 循环是否可以增加步长值？
这可能简短而有趣，但我只是在检查。批处理 for 命令可以有一个递增的步长值吗？ @echo off SetLocal EnableDelayedExpansion set xyz=200 for
batch-file - 文件路径中有空白空间的 Batch-Hell
目前我正处于批处理 hell 中。我想通过批处理文件调用我的 powershell 脚本。只要路径中没有空格，这就可以正常工作。例如，这是有效的 set DATAPATH="%~1
spring-batch - Spring Batch 单线程读取器和多线程写入器
试图找到以前是否有人问过这个问题，但找不到。问题来了。以下必须通过Spring批处理来实现有一个文件需要读取和处理。项目阅读器不是线程安全的。计划是让多线程同质处理器和多线程同质写入器插入由单线程读
spring-batch - 暂停和恢复作业执行 Spring Batch
这里有同样的问题- Spring batch pause/resume vs stop/restart 我在 Spring 检查了 BatchStatus 枚举，没有可用的状态 PAUSED，它仅作为
batch-file - for 循环后的条件 (Batch/CMD)
因此，我目前有这批使用 ffmpeg 将当前文件夹上的每个 .MTS 转换为 .MP4，但是当它完成后，我会在文件夹中同时获得 .mp4 和 .mts。我有 2 个批处理，一个用于转换文件，另一个用
spring-batch - Spring Batch 是不是有点矫枉过正
我需要每周一次将 CSV 加载到数据库中。由于 CSV 文件包含 2 个表的数据，因此需要进行一些数据处理。因此，我将不得不稍微处理一下 CSV 文件，可能会将其转换为 2 个不同的 CSV 文件并将
spring-batch - Spring Batch - 同时执行作业的多个实例
我有一个澄清。我们是否可以同时运行一个作业的多个实例。目前，我们在任何给定时间都有一个作业实例。如果可能，请告诉我如何做。最佳答案是的你可以。 Spring Batch 根据 JobPara
spring-batch - Spring Batch - 跳过过程记录
我想跳过一些过程记录。我尝试过的是，我创建了自定义异常并在我想跳过记录时抛出异常，并且它调用了 onSkipInProcess 方法的跳过监听器。它工作正常。请找到配置。
batch-file - 阻止我的 Batch 在屏幕上滑动
任何人都可以启发我一种方法来阻止我的 bat 在执行时在屏幕上闪烁吗？有没有办法阻止 CMD 窗口执行此操作？？？？最佳答案只是一个猜测，但要防止窗口在看不到打印内容的情况下立即打开和关闭，请在批
batch-file - Windows Batch 将记录添加到主机文件
我需要一个批处理文件来向 windows 中的主机文件添加一条记录，但是我不需要只添加文件，因为我想检查该记录是否已经存在。有可能吗？最佳答案 type "%SystemRoot%\system32
spring-batch - Spring Batch 事务管理如何工作？
我试图了解 Spring Batch 如何进行事务管理。这不是技术问题，而是概念问题:Spring Batch 使用什么方法以及该方法的后果是什么？让我试着澄清一下这个问题。例如，查看 Taskle
batch-file - [NT Batch]如何从用户输入的文件中获取目录？
我需要知道如何从用户输入的文件中提取目录信息，以下面的代码为例: ECHO Drag and drop your .txt file here, after that press Enter: SET
spring-batch - Spring Batch - 如何防止批处理在数据库中存储事务
首先是问题陈述:我在我的 DEV 环境中使用 Spring-Batch 很好。当我将代码移至生产环境时，我遇到了问题。在我的 DEV 环境中，Spring-Batch 能够毫无问题地在我们的 DB2
spring-batch - Spring Batch 架构
你好我是 Spring Batch 领域的新手，最近几天我花了一些时间观看 Michael Minella 的 youtube 视频，阅读了一些文档并成功运行了我在互联网上找到的一些演示项目。我认为
spring-batch - Spring Batch 处理编码的压缩文件
我正在研究使用 spring 批处理来处理编码压缩文件中的记录。记录是可变长度的，其中编码了嵌套的可变长度数据字段。我是 Spring 和 Spring Batch 的新手，这就是我计划构建批处理配
batch-file - Windows Batch - 从字符串中删除第一个单词
我正在尝试批量删除字符串中的第一个单词。示例:“这个 child 喜欢批处理”到“ child 喜欢批处理” 我试过: @echo off set /p text=text: for /f "tok

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - tf.train.shuffle_batch 和 `tf.train.batch 发生了什么？