gpt4 book ai didi

重新排序数据帧以保持一致

转载 作者:行者123 更新时间:2023-12-04 02:27:44 25 4
gpt4 key购买 nike

我有一个包含 9 列和 100,000 行的数据框。每行中的信息是混合的,如下例所示:

C1 <- c("Gender F", "Age 74", "Gender M")
C2 <- c("Age 54", "Gender M", "Col eyes Blue")
C3 <- c("Col eyes Brown","Col eyes Blue", "Age 56")
C4 <- c("Col hair Brown", "Col hair Black", "Col hair Blonde")

df <- cbind(C1, C2, C3, C4)

> df
C1 C2 C3 C4
[1,] "Gender F" "Age 54" "Col eyes Brown" "Col hair Brown"
[2,] "Age 74" "Gender M" "Col eyes Blue" "Col hair Black"
[3,] "Gender M" "Col eyes Blue" "Age 56" "Col hair Blonde"

我想让这个数据框保持一致,换句话说,我想让所有“性别”信息都在同一列中,等等。我是 R 的新手,正在努力寻找解决方案。有人可以帮忙吗?

最佳答案

如果您在每一行中都有所有键,则 apply 中的 sort 将起作用:

 t(apply(df, 1, sort))
# [,1] [,2] [,3] [,4]
#[1,] "Age 54" "Col eyes Brown" "Col hair Brown" "Gender F"
#[2,] "Age 74" "Col eyes Blue" "Col hair Black" "Gender M"
#[3,] "Age 56" "Col eyes Blue" "Col hair Blonde" "Gender M"

如果不是这种情况,您可以尝试对 unique keys 进行子集化。我假设 key 是一切但不是最后一个词,并使用 sub 获取它。

t1 <- sub(" [^ ]*$", "", df)
t2 <- unique(as.vector(t1))
do.call(rbind, lapply(seq_len(nrow(df)), function(i) df[i,match(t1[i,], t2)]))
# C1 C2 C3 C4
#[1,] "Gender F" "Age 54" "Col eyes Brown" "Col hair Brown"
#[2,] "Gender M" "Age 74" "Col eyes Blue" "Col hair Black"
#[3,] "Gender M" "Age 56" "Col eyes Blue" "Col hair Blonde"

关于重新排序数据帧以保持一致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66225285/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com