gpt4 book ai didi

python - 从对象检测 API 中使用的 TFRecord 文件中读取数据

转载 作者:行者123 更新时间:2023-12-04 11:42:12 25 4
gpt4 key购买 nike

我想读取存储在我用作火车记录的 TFRecord 文件中的数据 TF Object Detection API .

但是,我收到了 InvalidArgumentError: Input to reshape is a tensor with 91090 values, but the requested shape has 921600 .我不明白错误的根源是什么,即使差异似乎是 10 倍。

问题:
如何在没有此错误的情况下读取文件?

  • 我不能排除错误来自创建记录,或者错误在于我如何读取它。因此,我已经包含了我的代码。
  • 我可以使用数据运行 object_detection/train.py,并从训练好的模型生成卡住图。
  • 来自 this answer (及其提到的 GitHub 问题),我发现我必须将 PNG 图像转换为 JPG,因此 as_jpg -part(见下面我的代码)。
  • 我使用了来自 this answer 的代码作为读取文件的起点。
  • 我使用 Tensorflow 1.7.0,Python 3.5

  • 只有一类:“人类”。
    该记录有 1000 张图像;每个图像可以有一个或多个边界框。 (对应图像中的每个人一个。)

    我如何阅读 TFRecord :
    如上所述:我使用了来自 this answer 的代码作为读取文件的起点:
    train_record = 'train.record'

    def read_and_decode(filename_queue):
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)
    features = tf.parse_single_example(
    serialized_example,
    # Defaults are not specified since both keys are required.
    features={
    'image/height': tf.FixedLenFeature([], tf.int64),
    'image/width': tf.FixedLenFeature([], tf.int64),
    'image/source_id': tf.FixedLenFeature([], tf.string),
    'image/encoded': tf.FixedLenFeature([], tf.string),
    'image/format': tf.FixedLenFeature([], tf.string),
    'image/object/bbox/xmin': tf.VarLenFeature(tf.float32),
    'image/object/bbox/xmax': tf.VarLenFeature(tf.float32),
    'image/object/bbox/ymin': tf.VarLenFeature(tf.float32),
    'image/object/bbox/ymax': tf.VarLenFeature(tf.float32),
    'image/object/class/text': tf.VarLenFeature(tf.string),
    'image/object/class/label': tf.VarLenFeature(tf.int64)
    })
    image = tf.decode_raw(features['image/encoded'], tf.uint8)
    # label = tf.cast(features['image/object/class/label'], tf.int32)
    height = tf.cast(features['image/height'], tf.int32)
    width = tf.cast(features['image/width'], tf.int32)
    return image, height, width

    def get_all_records(FILE):
    with tf.Session() as sess:
    filename_queue = tf.train.string_input_producer([ FILE ])
    image, height, width = read_and_decode(filename_queue)
    image = tf.reshape(image, tf.stack([height, width, 3]))
    image.set_shape([640,480,3])
    init_op = tf.initialize_all_variables()
    sess.run(init_op)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    for i in range(1):
    example, l = sess.run([image])
    img = Image.fromarray(example, 'RGB')
    img.save( "output/" + str(i) + '-train.png')

    print (example,l)
    coord.request_stop()
    coord.join(threads)


    get_all_records(train_record)

    创作 :

    我做了一个类(class) Image对图像进行逻辑建模,还有一个类 Rect来表示边界框和标签。这不是很相关,但下面的代码在变量 img 时使用了它们。或 rect被看到。

    相关部分可能是 get_bytes() -method,它更像是使用 PIL 的 Image.open(file_path) 的包装器:
    class Image:

    # ... rest of class


    def open_img(self):
    if self.file_path is not None:
    return Image.open(self.file_path)

    def get_bytes(self, as_jpg=False):
    if self.file_path is None:
    return None
    if as_jpg:
    # Convert to jpg:
    with BytesIO() as f:
    self.open_img().convert('RGB').save(f, format='JPEG', quality=95)
    return f.getvalue()
    else: # Assume png
    return np.array(self.open_img().convert('RGB')).tobytes()

    我如何创建示例 :
    use_jpg = True

    def create_tf_example(img):
    image_format= b'jpg' if use_jpg else b'png'
    encoded_image_data = img.get_bytes(as_jpg=use_jpg) # Encoded image bytes

    relative_path = img.get_file_path()
    if relative_path is None or not img.has_person():
    return None # Ignore images without humans or image data
    else:
    filename = str(Path(relative_path).resolve()) # Absolute filename of the image. Empty if image is not from file

    xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
    xmaxs = [] # List of normalized right x coordinates in bounding box (1 per box)
    ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
    ymaxs = [] # List of normalized bottom y coordinates in bounding box (1 per box)
    classes_text = [] # List of string class name of bounding box (1 per box)
    classes = [] # List of integer class id of bounding box (1 per box)

    for rect in img.rects:
    if not rect.is_person:
    continue # For now, ignore negative samples as TF does this by default
    else:
    xmin, xmax, ymin, ymax = rect.get_normalized_xy_min_max()
    xmins.append(xmin)
    xmaxs.append(xmax)
    ymins.append(ymin)
    ymaxs.append(ymax)
    # Human class:
    classes.append(1)
    classes_text.append('Human'.encode())

    return tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(height),
    'image/width': dataset_util.int64_feature(width),
    # 'image/filename': dataset_util.bytes_feature(filename.encode()),
    'image/source_id': dataset_util.bytes_feature(filename.encode()),
    'image/encoded': dataset_util.bytes_feature(encoded_image_data),
    'image/format': dataset_util.bytes_feature(image_format),
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))

    我如何创建 TFRecord :
    def convert_to_tfrecord(imgs, output_file_path):
    with tf.python_io.TFRecordWriter(output_file_path) as writer:
    for img in imgs:
    tf_example = create_tf_example(img)
    if tf_example is not None:
    writer.write(tf_example.SerializeToString())


    convert_to_tfrecord(train_imgs, 'train.record')
    convert_to_tfrecord(validation_imgs, 'validate.record')
    convert_to_tfrecord(test_imgs, 'test.record')

    来自 dataset_util模块:
    def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


    def int64_list_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))


    def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


    def bytes_list_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))


    def float_list_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

    最佳答案

    我通过使用 tf.image.decode_jpeg 将数据解码为 jpeg 解决了这个问题.

    代替:

    def read_and_decode(filename_queue):
    # ...

    image = tf.decode_raw(features['image/encoded'], tf.uint8)

    # ...

    我做了:
    def read_and_decode(filename_queue):
    # ...

    image = tf.image.decode_jpeg(features['image/encoded'])

    # ...

    这解释了为什么预期大小和给定大小之间的差异如此之大的原因:给定(读取)字节是“仅”压缩的 JPEG 数据,
    而不是全尺寸的“完整”位图图像。

    关于python - 从对象检测 API 中使用的 TFRecord 文件中读取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49986522/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com