gpt4 book ai didi

python - 如何在 opencv 中仅提取 nxn 图像的 3 个特征向量?

转载 作者:太空宇宙 更新时间:2023-11-03 21:26:00 24 4
gpt4 key购买 nike

我正在尝试使用以下 paper 将 RGB 图像转换为灰度图像.

论文中使用的主要算法是这样的: Novel PCA based algorithm to convert images to grayscale

但是,当我尝试从图像中提取特征向量时,我得到了 500 个特征值,而不是所需的 3 个。据我所知,一个 NxN 矩阵通常会给出 N 个特征向量,但我不确定我应该在这里做什么才能只得到 3 个特征向量。

关于我应该做什么的任何帮助?到目前为止,这是我的代码:

import numpy as np
import cv2

def pca_rgb2gray(img):
"""
NOVEL PCA-BASED COLOR-TO-GRAY IMAGE CONVERSION
Authors:
-Ja-Won Seo
-Seong Dae Kim
2013 IEEE International Conference on Image Processing
"""
I_re = cv2.resize(img, (500,500))
Iycc = cv2.cvtColor(I_re, cv2.COLOR_BGR2YCrCb)
Izycc = Iycc - Iycc.mean()
eigvals = []
eigvecs = []
final_im = []
for i in range(3):
res = np.linalg.eig(Izycc[:,:,i])
eigvals.append(res[0])
eigvecs.append(res[1])
eignorm = np.linalg.norm(eigvals)
for i in range(3):
eigvals[i]/=eignorm
eigvecs[i]/=np.linalg.norm(eigvecs[i])
temp = eigvals[i] * np.dot(eigvecs[i], Izycc[:,:,i])
final_im.append(temp)
final_im = final_im[0] + final_im[1] + final_im[2]
return final_im
if __name__ == '__main__':
img = cv2.imread('image.png')
gray = pca_rgb2gray(img)

最佳答案

不幸的是,Ahmed 接受的答案有 PCA 数学错误,导致结果与手稿完全不同。这是从手稿中截取的图像屏幕。 original images in manuscript

均值居中和 SVD 应该沿着另一个维度进行, channel 被视为不同的样本。平均居中旨在获得零的平均像素响应,而不是零的平均 channel 响应。

链接算法还明确指出,PCA 模型的投影首先涉及将图像乘以分数,然后将此乘积乘以特征值,而不是像其他答案中那样反过来。

有关数学的更多信息,请参阅我的 PCA math answer here

可以在输出中看出代码的差异。由于手稿没有提供示例输出(我发现的),结果之间可能存在细微差异,因为手稿是捕获的屏幕截图。

为了比较,下载的颜色文件比屏幕截图的对比度高一点,因此输出的灰度也应该相同。 downloaded colour file

首先是 Ahmed 代码的结果: Ahmeds result

然后更新代码的结果: new result

更正后的代码(基于 Ahmed 的以便于比较)是

import numpy as np
import cv2
from numpy.linalg import svd, norm

# Read input image
Ibgr = cv2.imread('path/peppers.jpg')
#Convert to YCrCb
Iycc = cv2.cvtColor(Ibgr, cv2.COLOR_BGR2YCR_CB)

# Reshape the H by W by 3 array to a 3 by N array (N = W * H)
Izycc = Iycc.reshape([-1, 3]).T

# Remove mean along Y, Cr, and Cb *separately*!
Izycc = Izycc - Izycc.mean(0) #(1)[:, np.newaxis]
# Mean across channels is required (separate means for each channel is not a
# mathematically sensible idea) - each pixel's variation should centre around 0

# Make sure we're dealing with zero-mean data here: the mean for Y, Cr, and Cb
# should separately be zero. Recall: Izycc is 3 by N array.
# Original assertion was based on a false presmise. Mean value for each pixel should be 0
assert(np.allclose(np.mean(Izycc, 0), 0.0))

# Compute data array's SVD. Ignore the 3rd return value: unimportant in this context.
(U, S, L) = svd(Izycc, full_matrices=False)

# Square the data's singular vectors to get the eigenvalues. Then, normalize
# the three eigenvalues to unit norm and finally, make a diagonal matrix out of
# them.
eigvals = np.diag(S**2 / norm(S**2))

# Eigenvectors are just the right-singular vectors.
eigvecs = U;

# Project the YCrCb data onto the principal components and reshape to W by H
# array.
# This was performed incorrectly, the published algorithm shows that the eigenvectors
# are multiplied by the flattened image then scaled by eigenvalues
Igray = np.dot(eigvecs.T, np.dot(eigvals, Izycc)).sum(0).reshape(Iycc.shape[:2])
Igray2 = np.dot(eigvals, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
eigvals3 = eigvals*[1,-1,1]
Igray3 = np.dot(eigvals3, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])
eigvals4 = eigvals*[1,-1,-1]
Igray4 = np.dot(eigvals4, np.dot(eigvecs, Izycc)).sum(0).reshape(Iycc.shape[:2])

# Rescale Igray to [0, 255]. This is a fancy way to do this.
from scipy.interpolate import interp1d
Igray = np.floor((interp1d([Igray.min(), Igray.max()],
[0.0, 256.0 - 1e-4]))(Igray))
Igray2 = np.floor((interp1d([Igray2.min(), Igray2.max()],
[0.0, 256.0 - 1e-4]))(Igray2))
Igray3 = np.floor((interp1d([Igray3.min(), Igray3.max()],
[0.0, 256.0 - 1e-4]))(Igray3))
Igray4 = np.floor((interp1d([Igray4.min(), Igray4.max()],
[0.0, 256.0 - 1e-4]))(Igray4))

# Make sure we don't accidentally produce a photographic negative (flip image
# intensities). N.B.: `norm` is often expensive; in real life, try to see if
# there's a more efficient way to do this.
if norm(Iycc[:,:,0] - Igray) > norm(Iycc[:,:,0] - (255.0 - Igray)):
Igray = 255 - Igray
if norm(Iycc[:,:,0] - Igray2) > norm(Iycc[:,:,0] - (255.0 - Igray2)):
Igray2 = 255 - Igray2
if norm(Iycc[:,:,0] - Igray3) > norm(Iycc[:,:,0] - (255.0 - Igray3)):
Igray3 = 255 - Igray3
if norm(Iycc[:,:,0] - Igray4) > norm(Iycc[:,:,0] - (255.0 - Igray4)):
Igray4 = 255 - Igray4

# Display result
if True:
import pylab
pylab.ion()
fGray = pylab.imshow(Igray, cmap='gray')
# Save result
cv2.imwrite('peppers-gray.png', Igray.astype(np.uint8))

fGray2 = pylab.imshow(Igray2, cmap='gray')
# Save result
cv2.imwrite('peppers-gray2.png', Igray2.astype(np.uint8))

fGray3 =pylab.imshow(Igray3, cmap='gray')
# Save result
cv2.imwrite('peppers-gray3.png', Igray3.astype(np.uint8))

fGray4 =pylab.imshow(Igray4, cmap='gray')
# Save result
cv2.imwrite('peppers-gray4.png', Igray4.astype(np.uint8))

****编辑*****

继 Nazlok 关于特征向量方向不稳定性的查询(任何一个特征向量的定向方向是任意的,因此不能保证不同的算法(或没有可重现的方向标准化步骤的单一算法)会给出相同的结果。我现在添加了两个额外的例子,我只是简单地切换了特征向量的符号(2 号和 2 号和 3 号)。结果又是不同的,只有 PC2 的切换给出了更轻的音调,而切换 2和 3 相似(这并不奇怪,因为指数缩放将 PC3 的影响降低到非常小)。我将把最后一个留给那些懒得运行代码的人。

Result is the sign of PC2 is switched

结论

如果不采取明确的额外步骤来提供可重复和可重现的 PC 方向,该算法是不稳定的,我个人不愿意按原样使用它。 Nazlok 关于使用正负强度平衡的建议可以提供一个规则,但需要验证,因此超出了这个答案的范围。然而,这样的规则并不能保证“最佳”解决方案,只是一个稳定的解决方案。特征向量是单位向量,因此在方差(强度的平方)方面是平衡的。零的哪一侧具有最大的幅度总和只是告诉我们哪一侧的单个像素贡献了更大的方差,我怀疑这通常不是很有用。

关于python - 如何在 opencv 中仅提取 nxn 图像的 3 个特征向量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37676665/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com