gpt4 book ai didi

python - 根据图像调整边界框的大小

转载 作者:太空宇宙 更新时间:2023-11-03 21:13:16 24 4
gpt4 key购买 nike

我正在用 Python 实现对象本地化。我遇到的一个问题是,当我在采取行动时调整可观察区域的大小时,我不知道如何同时更改地面实况框。因此,会发生这种情况:

Ex1

地面实况框不会调整大小以准确适合平面。因此,我无法正确定位。我当前格式化下一个状态的函数如下:

def next_state(init_input, b, b_prime, g, a):
"""
Returns the observable region of the next state.

Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.

:param init_input:
The initial input volume of the current episode.

:param b:
The current state's bounding box.

:param b_prime:
The subsequent state's bounding box.

:param g:
The ground truth box of the target object.

:param a:
The action taken by the agent at the current step.
"""

# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)

# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224))

# Difference between crop region and image dimensions
x1_diff = x1
y1_diff = y1
x2_diff = IMG_SIZE - x2
y2_diff = IMG_SIZE - y2

# Resize ground truth box
g[0] = int(g[0] - 0.5 * x1_diff) # x1
g[1] = int(g[1] - 0.5 * y1_diff) # y1
g[2] = int(g[2] + 0.5 * x2_diff) # x2
g[3] = int(g[3] + 0.5 * y2_diff) # y2

return observable_region, g

我似乎无法正确调整尺寸。我关注了this发布以最初调整边界框的大小。然而,该解决方案在这种情况下似乎不起作用。

边界框/真值框的格式为:b = [x1, y1, x2, y2]

init_input 的维度是 (224, 224, 3)IMG_SIZE = 224context_pixels = 16

这是一个额外的例子:

Ex3

看起来 ground truth box 的大小是正确的,但是位置不对。

更新

我已经更新了上面的代码部分。比例因子似乎是解决问题的错误方法。通过仅添加/减去要放大的像素数,我已经接近了很多。我相信现在有一些与插值有关的东西,所以如果有人可以帮助它使其完美,那将是一个巨大的帮助。

新例子:

ExNew

更新2

A solution已提供。

最佳答案

我的问题已在 this 内解决由名为@lenik 的用户发布。

在将比例因子应用于ground truth box g的像素坐标之前,必须先减去零偏移量,这样x1, y1就变成了0 , 0。这允许缩放正常工作。

因此,变换后任意随机点(x,y)的坐标可以计算为:

x_new = (x - x1) * IMG_SIZE / (x2 - x1)
y_new = (y - y1) * IMG_SIZE / (y2 - y1)

在代码中,关于我的问题,解决方案如下:

def next_state(init_input, b_prime, g):
"""
Returns the observable region of the next state.

Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.

:param init_input:
The initial input volume of the current episode.

:param b_prime:
The subsequent state's bounding box.

:param g:
The ground truth box of the target object.
"""

# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)

# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224), interpolation=cv2.INTER_AREA)

# Resize ground truth box
g[0] = int((g[0] - x1) * IMG_SIZE / (x2 - x1)) # x1
g[1] = int((g[1] - y1) * IMG_SIZE / (y2 - y1)) # y1
g[2] = int((g[2] - x1) * IMG_SIZE / (x2 - x1)) # x2
g[3] = int((g[3] - y1) * IMG_SIZE / (y2 - y1)) # y2

return observable_region, g

关于python - 根据图像调整边界框的大小,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51220865/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com