gpt4 book ai didi

python - 如何将 2D numpy 数组转换为 One Hot 编码?

转载 作者:行者123 更新时间:2023-12-01 07:06:55 35 4
gpt4 key购买 nike

我试图对以下数据应用一种热编码。但我对输出感到困惑。在应用一种热编码之前,数据的形状是(5,10),在应用一种热编码之后,数据的形状是(5,20)。但每个字母都会被编码为 4 个元素。因此,在应用一种热编码后,形状应该是 (5, 40) 而不是 (5,10)。我该如何解决这个问题?

X = [[‘A’, ‘G’, ‘T’, ‘G’, ‘T’, ‘C’, ‘T’, ‘A’, ‘A’, ‘C’],
[‘A’, ‘G’, ‘T’, ‘G’, ‘T’, ‘C’, ‘T’, ‘A’, ‘A’, ‘C’],
[‘G’, ‘C’, ‘C’, ‘A’, ‘C’, ‘T’, ‘C’, ‘G’, ‘G’, ‘T’],
[‘G’, ‘C’, ‘C’, ‘A’, ‘C’, ‘T’, ‘C’, ‘G’, ‘G’, ‘T’],
[‘G’, ‘C’, ‘C’, ‘A’, ‘C’, ‘T’, ‘C’, ‘G’, ‘G’, ‘T’]]
Y = np.array(X)
print('Shape of numpy array', Y.shape)

# one hot encoding

onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoded = onehot_encoder.fit_transform(Y)
print(onehot_encoded)
print('Shape of one hot encoding', onehot_encoded.shape)


Output:

Shape of numpy array (5, 10)
[[1. 0. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 1. 0.]
[1. 0. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 1. 0.]
[0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 0. 1.]
[0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 0. 1.]
[0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 0. 1.]]
Shape of one hot encoding (5, 20)

最佳答案

您需要单独对每一列进行 one-hot 编码,这样您的 ndarray 中的每一列都会获得 4 个新列:

X = np.array(X)

# Get unique classes.
classes = np.unique(X)

# Replace classes with itegers.
X = np.searchsorted(classes, X)

# Get an identity matrix.
eye = np.eye(classes.shape[0])

# Iterate over all columns
# and get one-hot encoding for each column.
X = np.concatenate([eye[i] for i in X.T], axis=1)

X.shape
# (5, 40)

考虑以下示例:

[['A', 'G'],
['C', 'C'],
['T', 'A']]

您将在 one-hot 编码的 ndarray 中获得 8 (2 x 4) 列:

  Column 0      Column 1         
A C G T A C G T

1 0 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 0 0 1 1 0 0 0

关于python - 如何将 2D numpy 数组转换为 One Hot 编码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58406795/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com