image-processing - Spark 如何使用图像格式读取我的图像？-6ren

image-processing - Spark 如何使用图像格式读取我的图像？

转载作者：行者123 更新时间：2023-12-04 14:17:42

25

4

这可能是一个愚蠢的问题，但我无法弄清楚 Spark 如何使用 spark.read.format("image").load(....) 读取我的图像争论。

导入我的图像后，它给了我以下内容:

>>> image_df.select("image.height","image.width","image.nChannels", "image.mode", "image.data").show()
+------+-----+---------+----+--------------------+
|height|width|nChannels|mode|                data|
+------+-----+---------+----+--------------------+
|   430|  470|        3|  16|[4D 55 4E 4C 54 4...|
+------+-----+---------+----+--------------------+

我得出的结论是:

我的图像是 430x470 像素，

我的图像是彩色的(由于 nChannels = 3 为 RGB)，这是一种 openCV 兼容类型，

我的图像模式是 16，它对应于特定的 openCV 字节顺序。

有人知道我可以浏览哪个网站/文档以了解更多信息吗？

数据列中的数据类型为 Binary但:

当我运行时 image_df.select("image.data").take(1)我得到的输出似乎只有一个数组(见下文)。

>>> image_df.select("image.data").take(1)

# **1/** Here are the last elements of the result
....<<One Eternity Later>>....x92\x89\x8a\x8d\x84\x86\x89\x80\x84\x87~'))]

# 2/ I got also several part of the result which looks like:
.....\x89\x80\x80\x83z|\x7fvz}tpsjqtkrulsvmsvmsvmrulrulrulqtkpsjnqhnqhmpgmpgmpgnqhnqhn
qhnqhnqhnqhnqhnqhmpgmpgmpgmpgmpgmpgmpgmpgnqhnqhnqhnqhnqhnqhnqhnqhknejmdilcilchkbh
kbilcilckneloflofmpgnqhorioripsjsvmsvmtwnvypx{ry|sz}t{~ux{ry|sy|sy|sy|sz}tz}tz}tz}
ty|sy|sy|sy|sz}t{~u|\x7fv|\x7fv}.....

接下来的内容与上面显示的结果相关联。这些可能是由于我缺乏有关 openCV(或其他)的知识。尽管如此:

1/我不明白如果我得到一个 RGB 图像，我应该有 3 个矩阵，但输出在 .......\x84\x87~'))] 结束。 .我更想获得类似 [(...),(...),(...\x87~')] 的东西.

2/这部分有什么特殊意义吗？像那些是每个矩阵之间的分隔符还是什么？

为了更清楚我想要实现的目标，我想处理图像以在每个图像之间进行像素比较。因此，我想知道图像中给定位置的像素值(我假设如果我有一个 RGB 图像，我将有 3 个给定位置的像素值)。

示例:假设我有一个仅在白天指向天空的网络摄像头，我想知道与左上角天空部分相对应的位置处的像素值，我发现这些值的串联给出了颜色浅蓝色表示照片是在晴天拍摄的。假设唯一的可能性是晴天采用颜色 Light Blue .
接下来，我想将前一个连接与另一个位于完全相同位置但来自第二天拍摄的照片的像素值连接进行比较。如果我发现它们不相等，那么我得出的结论是给定的照片是在阴天/下雨天拍摄的。如果相等则晴天。

对此的任何帮助将不胜感激。为了更好地理解，我已经把我的例子粗化了，但我的目标几乎是一样的。我知道可以存在 ML 模型来实现这些东西，但我很乐意先尝试一下。我的第一个目标是将此列拆分为 3 列对应于每个颜色代码:红色矩阵、绿色矩阵、蓝色矩阵

最佳答案

我想我有这个逻辑。我使用 keras.preprocessing.image.img_to_array() 函数来了解值是如何分类的(因为我有一个 RGB 图像，我必须有 3 个矩阵:每个颜色 R G B 一个)。发帖说如果有人想知道它是如何工作的，我可能错了，但我想我有一些东西:

from keras.preprocessing import image
import numpy as np
from PIL import Image

# Using spark built-in data source
first_img = spark.read.format("image").schema(imageSchema).load(".....")
raw = first_img.select("image.data").take(1)[0][0]
np.shape(raw)
(606300,) # which is 470*430*3



# Using keras function
img = image.load_img(".../path/to/img")
yy = image.img_to_array(img)
>>> np.shape(yy)
(430, 470, 3) # the form is good but I have a problem of order since:

>>> raw[0], raw[1], raw[2]
(77, 85, 78)
>>> yy[0][0]
array([78., 85., 77.], dtype=float32)

# Therefore I used the numpy reshape function directly on raw 
# to have 470 matrix of 3 lines and 470 columns:

array = np.reshape(raw, (430,470,3))
xx = image.img_to_array(array)     # OPTIONAL and not used here

>>> array[0][0] == (raw[0],raw[1],raw[2])
array([ True,  True,  True])

>>> array[0][1] == (raw[3],raw[4],raw[5])
array([ True,  True,  True])

>>> array[0][2] == (raw[6],raw[7],raw[8])
array([ True,  True,  True])

>>> array[0][3] == (raw[9],raw[10],raw[11])
array([ True,  True,  True])

因此，如果我理解得很好，spark 会将图像读取为一个大数组 - (606300,) 在这里 - 实际上每个元素都是有序的并对应于它们各自的颜色深浅(R G B)。
完成我的小转换后，我获得了 3 列 x 470 行的 430 矩阵。由于我的图像是 (470x430) for (WidthxHeight)，每个矩阵对应一个像素高度位置和每个矩阵内部:每种颜色 3 列，每个宽度位置 470 行。

希望对某人有所帮助:)!

关于image-processing - Spark 如何使用图像格式读取我的图像？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58611888/

25

4

0

文章推荐： java - 相当于 JPL7 中的 "assert"

文章推荐： c# - Windows 窗体中的多级组合框

文章推荐： python - 如何在 Google Colab 中恢复未保存的代码

文章推荐： ruby-on-rails - 事件记录 : How to include another model based on Enum?

processing - "Processing"编程语言用于什么？
关闭。这个问题是opinion-based .它目前不接受答案。想改善这个问题吗？更新问题，以便可以通过 editing this post 用事实和引文回答问题. 5年前关闭。 Improve t
processing - Processing/Arduino中如何计算统计模式
我是一名设计老师，试图帮助学生应对编程挑战，所以我编码是为了好玩，但我不是专家。她需要找到 mode (最常见的值)在使用耦合到 Arduino 的传感器的数据构建的数据集中，然后根据结果激活一些功
Node.js/Electron : How to Identify the process is windows process or other application process
我正在开发一个应用程序，该应用程序提供 CPU 使用率最高的 5 个应用程序名称。目前，我通过以下代码获得了排名前 5 的应用程序: var _ = require('lodash');
emacs - 微调: `set-process-sentinel` | `set-process-filter` | `start-process`
互联网上很少有例子涉及这个问题的所有三个问题——即 set-process-sentinel ; set-process-filter ;和 start-process . 我尝试了几种不同的方法来微
c# - Process.Start 与 C# 中的 Process `p = new Process()`？
如 this post 中所述，在 C# 中有两种调用另一个进程的方法。 Process.Start("hello"); 和 Process p = new Process(); p.StartInf
processing - 如何在 Processing 中用渐变填充矩形或椭圆？
我试图让我的桨从白色变为渐变(线性)，并使球具有径向渐变。感谢您的帮助!您可以在 void drawPaddle 中找到桨的代码。这是我的目标: 这是我的代码: //球 int ballX = 50
process - VHDL - process() 什么时候第一次运行？
考虑:流程(a)根据我的文字: A process is first entered at the time of simulation, at which time it is executed u
processing - 从 Processing 中的数组中删除对象的最佳方法
我真的希望 Processing 有用于处理数组的 push 和 pop 方法，但由于它没有，我不得不试图找出删除数组中特定位置的对象的最佳方法。我相信这对很多人来说都是基本的，但我可以使用一些帮助，
c++ - "is processed"还是 "was processed"？
关闭。这个问题是off-topic .它目前不接受答案。想改进这个问题吗？ Update the question所以它是on-topic用于堆栈溢出。关闭 10 年前。 Improve thi
c# - Windows 10 : How to determine whether a process is an App/Background Process/Windows Process
以编程方式，我如何确定 Windows 10 中的 3 个类别应用后台进程 Windows 服务就像任务管理器一样？即我需要一些 C# 代码，我可以确定应用程序列表与后台进程列表。检查 Win
javascript - Node :process and process?有什么区别
当我导入 node:process它工作正常。但是，当我尝试要求相同时，它会出错。这工作正常: import process from 'node:process'; 但是当我尝试要求相同时，它会引
processing - Processing 中的 map() 函数是如何工作的？
我正在上一门使用处理的类(class)。我在理解 map() 函数时遇到问题。根据它的文档( http://www.processing.org/reference/map_.html ): Re
process - Composer 更新 "process killed"
我试图执行: composer.phar update 并收到: Fatal error: Allowed memory size of 94371840 bytes exhausted (tried
processing - 使用 processing.js 进行体积渲染
给定一堆二维图像，如何使用 Processing/Processing.js 产生体积渲染效果？目前我的想法是使用 java(类似于 imageJ)进行体积渲染 -> 获取体积渲染图像的面作为单独的
c# - 我在调用 Process.Start() 时收到 'A 32 bit processes cannot access modules of a 64 bit process.' 异常
这是代码示例 var startInfo = new ProcessStartInfo { Arguments = commandStr, FileName = @"C:\Window
processing - 从 Sketch 菜单添加时，Processing 库安装在哪里？
当我在 Processing(草图 > 导入库 > 添加库)中添加库时，它安装在哪里？最佳答案它们安装在您的中速写本位置 . 您可以通过转到"file">“首选项”来查看和更改您的速写本位置。草
.net - 为什么是 Process.WorkingSet > Process.MaxWorkingSet？
无聊的好奇... 我正在查看当前进程的一些属性: using(Process p = Process.GetCurrentProcess()) { // Inspect properties
processing.js - 如何同时运行多个 processing.js 草图
我正在尝试在同一页面上运行多个草图。初始化脚本指定: /* * This code searches for all the * in your page and loads each scrip
.net - Process.Kill 后是否需要使用 Process.WaitForExit？
Process.Kill 后是否需要使用 Process.WaitForExit？如果调用进程在调用 Process.Kill 后立即退出怎么办？这会导致 Process.Kill 失败吗？编辑
processing - 使用 Minim 在 Processing 中获取频率
我尝试使用处理从麦克风获取频率。我混合了文档中的两个示例，但“最高”并不是真正的赫兹(a 是 440 赫兹)。你知道如何拥有比这更好的东西吗？ import ddf.minim.*; import

首页

博学

6Ren·AI

商城

image-processing - Spark 如何使用图像格式读取我的图像？