tensorflow - 用于图像数据的 TFX StatisticsGen-6ren

tensorflow - 用于图像数据的 TFX StatisticsGen

转载作者：行者123 更新时间：2023-12-04 00:58:37

25

4

您好，我正在尝试让 TFX Pipeline 像练习一样运行。我正在使用 ImportExampleGen 从磁盘加载 TFRecords。 TFRecord 中的每个 Example 都包含字节字符串形式的 jpg、高度、宽度、深度、转向和 throttle 标签。

我正在尝试使用 StatisticsGen 但我收到此警告；WARNING:root:Feature“image_raw”的字节值为“None”，无法将其解码为 UTF-8 字符串。 并导致我的 Colab Notebook 崩溃。据我所知，TFRecord 中的所有字节字符串图像都没有损坏。

我找不到有关 StatisticsGen 和处理图像数据的具体示例。根据docs Tensorflow Data Validation 可以处理图像数据。

In addition to computing a default set of data statistics, TFDV can also compute statistics for semantic domains (e.g., images, text). To enable computation of semantic domain statistics, pass a tfdv.StatsOptions object with enable_semantic_domain_stats set to True to tfdv.generate_statistics_from_tfrecord.

但我不确定这如何适合 StatisticsGen。

这是实例化 ImportExampleGen 然后是 StatisticsGen

的代码

from tfx.utils.dsl_utils import tfrecord_input
from tfx.components.example_gen.import_example_gen.component import ImportExampleGen
from  tfx.proto import example_gen_pb2

examples = tfrecord_input(_tf_record_dir)
# https://www.tensorflow.org/tfx/guide/examplegen#custom_inputoutput_split
# has a good explanation of splitting the data the 'output_config' param

# Input train split is _tf_record_dir/*'
# Output 2 splits: train:eval=8:2.
train_ratio = 8
eval_ratio  = 10-train_ratio
output = example_gen_pb2.Output(
             split_config=example_gen_pb2.SplitConfig(splits=[
                 example_gen_pb2.SplitConfig.Split(name='train',
                                                   hash_buckets=train_ratio),
                 example_gen_pb2.SplitConfig.Split(name='eval',
                                                   hash_buckets=eval_ratio)
             ]))
example_gen = ImportExampleGen(input=examples,
                               output_config=output)
context.run(example_gen)

statistics_gen = StatisticsGen(
    examples=example_gen.outputs['examples'])
context.run(statistics_gen)

提前致谢。

最佳答案

来自 git issue response谢谢Evan Rosen

大家好，

您看到的警告表明 StatisticsGen 正在尝试将您的原始图像特征视为分类字符串特征。图像字节被解码得很好。问题在于，当写入统计信息(包括前 K 个示例)时，输出原型(prototype)需要一个 UTF-8 有效字符串，但却获取了原始图像字节。据我所知，您的设置没有任何问题，但这只是善意警告的意外副作用，如果您有无法序列化的分类字符串功能。我们将寻找更好的默认设置来更优雅地处理图像数据。

与此同时，要告诉 StatisticsGen 此功能实际上是一个不透明的 blob，您可以按照 StatsGen 文档中的描述传入用户修改的架构。要生成此模式，您可以运行一次 StatisticsGen 和 SchemaGen(针对数据样本)，然后修改推断的模式以注释该图像特征。这是来自@tall-josh 的 colab 的修改版本:

Open In Colab

额外的步骤有点冗长，但出于其他原因，拥有一个精心策划的架构通常是一个很好的做法。这是我添加到笔记本中的单元格:

from google.protobuf import text_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2

# Load autogenerated schema (using stats from small batch)

schema = tfx.utils.io_utils.SchemaReader().read(
    tfx.utils.io_utils.get_only_uri_in_dir(
        tfx.types.artifact_utils.get_single_uri(schema_gen.outputs['schema'].get())))

# Modify schema to indicate which string features are images.
# Ideally you would persist a golden version of this schema somewhere rather
# than regenerating it on every run.
for feature in schema.feature:
  if feature.name == 'image/raw':
    feature.image_domain.SetInParent()

# Write modified schema to local file
user_schema_dir ='/tmp/user-schema/'
tfx.utils.io_utils.write_pbtxt_file(
    os.path.join(user_schema_dir, 'schema.pbtxt'), schema)

# Create ImportNode to make modified schema available to other components
user_schema_importer = tfx.components.ImporterNode(
    instance_name='import_user_schema',
    source_uri=user_schema_dir,
    artifact_type=tfx.types.standard_artifacts.Schema)

# Run the user schema ImportNode
context.run(user_schema_importer)

希望您发现此解决方法很有用。与此同时，我们将研究图像值(value)功能的更好默认体验。

关于tensorflow - 用于图像数据的 TFX StatisticsGen，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60577308/

25

4

0

文章推荐： kubernetes - 我们如何在 filebeat kubernetes 中过滤命名空间？

文章推荐： haskell - 截断为 Word 类型

文章推荐： windows-7 - 如何在 Windows 7 计算机上虚拟化 iOS

tensorflow - 哪种 TFX 编排实际上是 TFX 的标准？
我是 tensorflow 的初学者，现在我在一个项目中需要为 tensorflow 部署分布式生产平台。如果我能得到一些帮助来澄清我的想法，我将不胜感激。阅读online doument , 和
azure - 来自 "tfx"服务器的 TFX-CLI : is there a way to download task. json？
有没有办法使用tfx-cli从“tfx”服务器下载task.json？我想使用以下命令恢复我们在 tfx 中上传的 task.json: tfx build tasks upload –task.p
azure - 来自 "tfx"服务器的 TFX-CLI : is there a way to download task. json？
有没有办法使用tfx-cli从“tfx”服务器下载task.json？我想使用以下命令恢复我们在 tfx 中上传的 task.json: tfx build tasks upload –task.p
tensorflow - 使用 TFX 设计图像管道
在阅读 TFX 的文档时，尤其是与数据预处理相关的部分，我认为流水线设计更适合分类特征。我想知道 TFX 是否也可以用于涉及图像的管道。最佳答案是的，TFX 也可以用于涉及图像的管道。特别是在
tensorflow - 用于图像数据的 TFX StatisticsGen
您好，我正在尝试让 TFX Pipeline 像练习一样运行。我正在使用 ImportExampleGen 从磁盘加载 TFRecords。 TFRecord 中的每个 Example 都包含字节字符
python - TFX - 无需序列化数据输入即可获得预测的 REST API
我是 TFX 的新手，我一直在学习 Keras 教程，并且已经使用我的数据成功创建了 TFX 管道。当我学习通过带有 TF 服务的 Docker 为我的模型提供服务时，我的数据输入必须按如下方式序列化
database - 如何从 TFX BulkInferrer 获取数据帧或数据库写入？
我对 TFX 很陌生，但有一个明显有效的 ML 管道，可通过 BulkInferrer 使用。 .这似乎只以 Protobuf 格式生成输出，但由于我正在运行批量推理，我想将结果通过管道传输到数据库。
python - Evaluator 组件上的 TFX IndexError
我正在尝试为我的模型制作一个评估器。到目前为止，所有其他组件都很好，但是当我尝试此配置时: eval_config = tfma.EvalConfig( model_specs=[
kubeflow - 在本地使用 Kubeflow 编排 TFX 管道
嘿，我正在开发一个包，该包生成用于训练 GPT-2 的 TFX 管道(参见 https://github.com/steven-mi/tfx-gpt2)。我想知道如何在本地将我的管道部署到 Kube
tensorflow - TensorFlow TFX 中存储在 MLMD 中的数据
据我了解，TensorFlow 使用 MLMD记录和检索与工作流相关的元数据。这可能包括: 管道组件的结果关于通过管道组件生成的工件的元数据有关这些组件执行的元数据有关管道和关联沿袭信息的元数据
python - 使用 tfx 运行多个训练器时 Kubeflow Pipeline RuntimeError
我喜欢让多个培训师使用相同的 ExampleGen、Schema 和 Transform 同时运行。下面是我的代码，添加了额外的组件作为 trainer2 evaluator2 和 pusher2。但
tensorflow - Nightly TF/Cloned TFX - 如何管理 Kubeflow 的图像？
当我访问我的 Kubeflow 端点以使用克隆的 TFX 上传和运行管道时，该过程在生成此消息的第一步开始挂起: “此步骤处于待处理状态，并显示以下消息:ImagePullBackOff: Back-
tfs - 如何使用 tfs-cli (tfx) 更新本地 TFS 实例的 TFS 扩展？
我的公司有一个本地 TFS 2017 实例。我知道如何通过 Web 界面安装和更新扩展程序，但这很乏味。我想知道是否有更快的方法来使用 tfs-cli . 最佳答案由于某种原因，我花了一段时间才弄清
python - TensorFlow Extended (TFX) : Clarify Beam, Airflow 和 Kubeflow 使用
我希望有人能澄清 TensorFlow 与其依赖项(Beam、AirFlow、Flink 等)之间的关系我正在引用主 TFX 页面: https://www.tensorflow.org/tfx/g
keras - 将 TF2 keras 模型的 signaturedef 映射到 TFX 管道中的 TF 服务分类/预测/回归 API 的最佳做法是什么？
我们正在 Airflow 上构建一个自动化的 TFX 管道，我们的模型基于 Keras Tutorial .我们保存keras模型如下: model.save(fn_args.serving_mode

首页

博学

6Ren·AI

商城

tensorflow - 用于图像数据的 TFX StatisticsGen