gpt4 book ai didi

r - 如何按类别将列中的数据重组为新列

转载 作者:搜寻专家 更新时间:2023-10-30 20:07:54 25 4
gpt4 key购买 nike

我们在夏季收集了超过 130000 份植物物候观测数据,并将数据输入 Excel。每个观察包括 1 到 6 个描述植物物候学不同方面的分类变量。例如,我可能会收集关于一棵白桦树的一项观察结果 - 生长的叶子,或者我可能会收集关于一棵白桦树的两项观察结果 - 叶子生长和开花。

不幸的是,我没有按照数据表上的逻辑顺序收集分类代码,因此在 Excel 中输入了它们,但没有反射(reflect)物候代码的类别(即其他、落叶、开花、水果、叶子),这造成了数据噩梦衰老,叶片脱落)

这是我的数据的样子(在问题底部找到 R 的样本数据):

enter image description here

我的数据应该是这样的:

enter image description here

我创建了一个电子表格,其中包含我所有的物候代码及其相关的物候类别(同样,其他、落叶、开花、果实、叶片衰老、叶片脱落)。

我想使用我导入到 R 中的物候代码电子表格(请参阅底部的代码)将我的数据集重新组织为上面显示的逻辑格式。我可以通过创建每个新字段然后编写大量条件语句(不需要物候代码电子表格!)来做到这一点,但我不知道如何有效地使用我的数据和物候代码来快速有效地重组我的数据。

最后,在我的物候代码电子表格中,我创建了一个排名字段来处理有时我的技术人员在同一类别中记录两个观察结果的事实。在这种情况下,应始终以最高的数字或等级为准。

Sample.Data <- structure(list(Species = c("A", "B", "C", "D", "E","F", "G", "H", "I", 
"J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T"),
Code.1 = c("C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C"),
Code.2 = c("V", "0", "rf", "0", "0", "0", "uf", "uf", "uf", "uf", "0", "0", "0",
"0", "uf", "uf", "0", "0", "0", "0"), Code.3 = c("g3", "gd", "r3", "r3", "r3", "r3",
"V", "V", "V", "V", "g1", "gd", "vd", "g1", "V", "V", "g1", "r3", "r3", "r3"),
Code.4 = c("vd", "vd", "vd", "vd", "vd", "vd", "g3", "g3", "g3", "g3", "vd", "vd", "r2",
"vd", "g1", "vd", "vd", "vd", "vd", "vd"),
Code.5 = c("L2", "L1", "L1", "L2", "L2", "L2", "L2", "L2", "L3", "L2", "L3", "L2", "L2",
"L3", "L1", "L1", "L2", "L1", "L1", "L2"),
Code.6 = c("K", "K", "K", "K", "b1", "b3", "b2", "K", "K", "b4", "K", "K", "K", "b1",
"b3", "Y", "Z", "Y", "K", "b1")), .Names = c("Species", "Code.1", "Code.2",
"Code.3", "Code.4", "Code.5", "Code.6"), row.names = c(NA, -20L), class = "data.frame")

Pheno.Codes <- structure(list(`Pheno Code` = c("Y", "0", "Z", "A", "B1", "B2",
"C", "FA", "As", "Af", "R", "Rs", "Rf", "Ra", "K", "w", "m", "mw",
"wm", "st", "b", "b1", "b2", "b3", "b2", "b4", "uf", "rd", "rf",
"V", "VL", "Vb", "gd", "gb", "g1", "g2", "g3", "ed", "r", "r1",
"r2", "r3", "vd", "vt", "L", "L1", "L2", "L3", "L4", "X"),
`Pheno Category` = c("Other", "Other", "Leaf-out", "Leaf-out",
"Leaf-out", "Leaf-out", "Leaf-out", "Flowering", "Flowering",
"Flowering", "Flowering", "Flowering", "Flowering", "Flowering",
"Flowering", "Flowering", "Flowering", "Flowering", "Flowering",
"Flowering", "Flowering", "Flowering", "Flowering", "Flowering",
"Flowering", "Flowering", "Fruit", "Fruit", "Fruit", "Fruit",
"Fruit", "Fruit", "Leaf senescence", "Leaf senescence",
"Leaf senescence", "Leaf senescence", "Leaf senescence",
"Leaf senescence", "Leaf senescence", "Leaf senescence",
"Leaf senescence", "Leaf senescence", "Leaf senescence",
"Leaf senescence", "Leaf abscission", "Leaf abscission",
"Leaf abscission", "Leaf abscission", "Leaf abscission",
"Other"), Rank = c(0, 0.1, 0.5, 1, 1.1, 1.2, 1.3, 2, 2, 2.1, 2,
2, 2.1, 2.3, 2, 2.1, 2.1, 2.1, 2.1, 2.1, 2.1, 2.1, NA, 2.3, NA,
2.5, 3, 3.1, 3.2, 3.2, 3.2, 3.3, 4, 4, 4.1, 4.2, 4.3, 4.4, 4.4,
4.5, 4.6, 4.7, 4.8, 4.9, 5, 5, 5.1, 5.2, 5.3, 6)), .Names = c("Pheno Code",
"Pheno Category", "Rank"), class = "data.frame", row.names = c(NA, -50L),
class = "data.frame")

Sample.Data2 <- structure(list(Species = c("A", "B", "C", "D", "E","F", "G", "H", "I",
"J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T"),
Code.1 = c("C", "C", "B1", "C", "", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C"),
Code.2 = c("V", "0", "rf", "0", "0", "0", "uf", "uf", "uf", "uf", "0", "", "0",
"0", "uf", "uf", "0", "0", "0", "0"), Code.3 = c("g3", "gd", "r3", "r3", "r3", "r3",
"V", "V", "", "V", "g1", "gd", "vd", "g1", "V", "V", "g1", "r3", "r3", "r3"),
Code.4 = c("", "vd", "vd", "vd", "vd", "vd", "g3", "g3", "g3", "g3", "vd", "vd", "r2",
"qd", "g1", "vd", "vd", "vd", "vd", "vd"),
Code.5 = c("L2", "L1", "L1", "L7", "L2", "L2", "L2", "L2", "L3", "L2", "L3", "L2", "L2",
"L3", "L1", "L1", "L2", "L1", "L1", "L2"),
Code.6 = c("", "", "K", "K", "b1", "b6", "b2", "K", "K", "b4", "K", "K", "K", "b1",
"b3", "Y", "Z", "Y", "K", "b1")), .Names = c("Species", "Code.1", "Code.2",
"Code.3", "Code.4", "Code.5", "Code.6"), row.names = c(NA, -20L), class = "data.frame")

最佳答案

data.table 的可能解决方案:

# load the 'data.table'-package
library(data.table)

# convert both dataframes to data.table's
setDT(Sample.Data)
setDT(Pheno.Codes)

# reshape 'Sample.Data' to long format
sample.long <- melt(Sample.Data, id = 'Species')

# join with 'Pheno.Codes'
# filter/select for each 'Species'/'pheno.cat' combo the row where the rank is equal to the max rank
# reshape the result into wide format again
sample.long[Pheno.Codes, on = c('value' = 'Pheno Code'), `:=` (pheno.cat = `Pheno Category`, rnk = Rank)
][, .SD[rnk == max(rnk)], by = .(Species, pheno.cat)
][, dcast(.SD, Species ~ pheno.cat, value.var = 'value', fill = '')]

给出:

    Species Flowering Fruit Leaf abscission Leaf senescence Leaf-out Other
1: A K V L2 vd C
2: B K L1 vd C 0
3: C K rf L1 vd C
4: D K L2 vd C 0
5: E b1 L2 vd C 0
6: F b3 L2 vd C 0
7: G V L2 g3 C
8: H K V L2 g3 C
9: I K V L3 g3 C
10: J b4 V L2 g3 C
11: K K L3 vd C 0
12: L K L2 vd C 0
13: M K L2 vd C 0
14: N b1 L3 vd C 0
15: O b3 V L1 g1 C
16: P V L1 vd C Y
17: Q L2 vd C 0
18: R L1 vd C 0
19: S K L1 vd C 0
20: T b1 L2 vd C 0

更新

为了响应评论中提到的规范,您可以将代码调整为:

setDT(Sample.Data2)
setDT(Pheno.Codes)

sample.long <- melt(Sample.Data2, id = 'Species')[value != '']

sample.long[Pheno.Codes, on = c('value' = 'Pheno Code'), `:=` (pheno.cat = `Pheno Category`, rnk = Rank)
][is.na(pheno.cat), `:=` (pheno.cat = 'ERROR', rnk = 0)
][, .SD[rnk == max(rnk)], by = .(Species, pheno.cat)
][, dcast(.SD, Species ~ pheno.cat, value.var = 'value', fill = '')]

关于r - 如何按类别将列中的数据重组为新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48113251/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com