gpt4 book ai didi

r - 如何根据数据的整体顺序更改特定的分类变量

转载 作者:行者123 更新时间:2023-12-01 10:24:48 24 4
gpt4 key购买 nike

我每五天收集一次关于植物发育或物候学的数据(使用分类变量“代码”编码),沿着一个横断面分为 78 个连续的部分。每个物种都在每个部分的横断面上进行了调查。

我的研究是在重复一个100年前的历史研究,我保留了原来的物候编码方案,没有考虑夏天之后我如何分析数据!

我在收集数据时没有考虑到的问题是代码遵循一个序列,其中一个代码在夏季的早晚重复。具体来说,代码是:

b1 = single flower
b2 = sparse flowers (two or three)
b3 = flowers common (more than three)
B4 = flowering ended

根据最初研究的方法,夏季为任何开花植物收集的代码序列将类似于 b1、b2、b3、b2、b1、b4。请注意,我们每五天访问一次横断面,代码可能会在连续几天内重复,例如b1, b1, b2, b2, b2, b2, b3, b3, b3, b2, b2, b1, b4.

我想重新编写“b1”和“b2”代码如下(参见示例和示例数据):

1. 如果'b1'出现在'b2'或'b3'之前,那么它应该是'b1a',如果它出现在'b2'或'b3'之后,那么它应该是'b1b'。请注意,有时观察序列中没有“b2”或“b3”。

2. 如果'b2'出现在'b3'之前,那么它应该是'b2a',如果它出现在'b3'之后,它应该是'b2b'。 OR 如果没有“b3”,则“b2”应该是“b2a”。请注意,重要的是要记住,在最后一次出现“b3”之后,可能会有多次观察到“b2”(参见示例和示例数据)。

3. 考虑到 'b1' 和 'b2' 可能在没有观察到 'b3' 的情况下发生,在这种情况下,两者都将被编码为 'b1a' 和 'b2a'。

这是数据的样子:

Date    Segment Species Code
01-Jun-17 1 A b1
06-Jun-17 1 A b1
10-Jun-17 1 A b2
14-Jun-17 1 A b2
19-Jun-17 1 A b2
23-Jun-17 1 A b3
28-Jun-17 1 A b3
03-Jul-17 1 A b2
08-Jul-17 1 A b2
14-Jul-17 1 A b1
19-Jul-17 1 A b4
23-Jul-17 1 A b4

它应该是这样的:

Date    Segment Species Code
01-Jun-17 1 A b1
06-Jun-17 1 A b1a
10-Jun-17 1 A b2a
14-Jun-17 1 A b2a
19-Jun-17 1 A b2a
23-Jun-17 1 A b3
28-Jun-17 1 A b3
03-Jul-17 1 A b2b
08-Jul-17 1 A b2b
14-Jul-17 1 A b1b
19-Jul-17 1 A b4
23-Jul-17 1 A b4

这是示例数据:

Test.Data<- structure(list(Date = structure(c(17318, 17323, 17327, 17331, 
17336, 17340, 17345, 17350, 17355, 17361, 17366, 17318, 17323,
17327, 17331, 17336, 17340, 17345, 17350, 17355, 17361, 17366,
17370, 17375, 17318, 17323, 17327, 17331, 17336, 17340, 17345,
17350, 17355, 17361, 17366, 17318, 17323, 17327, 17331, 17336,
17340, 17345, 17350, 17355, 17361, 17366, 17370, 17375, 17355,
17361, 17366, 17370, 17375, 17350, 17355, 17361, 17366, 17370
), class = "Date"), Segment = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 1, 1, 1, 1, 1), Species = c("A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"
), Code = c("b1", "b1", "b2", "b2", "b2", "b3", "b3", "b2", "b2",
"b4", "b4", "b1", "b2", "b2", "b2", "b3", "b3", "b3", "b2", "b2",
"b2", "b1", "b4", "b4", "b1", "b1", "b2", "b2", "b2", "b3", "b3",
"b2", "b2", "b4", "b4", "b1", "b2", "b2", "b2", "b3", "b3", "b3",
"b2", "b2", "b2", "b4", "b4", "b4", "b3", "b3", "b2", "b1", "b4",
"b1", "b1", "b2", "b2", "b4")), .Names = c("Date", "Segment",
"Species", "Code"), row.names = c(NA, -58L), class = "data.frame")

最佳答案

使用数据表:

library(data.table)
setDT(Test.Data)
Test.Data[, temp := rleid(Code), by = .(Segment, Species)] #unique ids for the sequence of codes
Test.Data[Code == "b2", Code := paste0(Code, letters[rleid(temp)]),
by = .(Segment, Species)] #use the unique ids inside subset
Test.Data[, temp := NULL]
# Date Segment Species Code
# 1: 2017-06-01 1 A b1
# 2: 2017-06-06 1 A b1
# 3: 2017-06-10 1 A b2a
# 4: 2017-06-14 1 A b2a
# 5: 2017-06-19 1 A b2a
# 6: 2017-06-23 1 A b3
# 7: 2017-06-28 1 A b3
# 8: 2017-07-03 1 A b2b
# 9: 2017-07-08 1 A b2b
#10: 2017-07-14 1 A b4
#11: 2017-07-19 1 A b4
#12: 2017-06-01 1 B b1
#13: 2017-06-06 1 B b2a
#14: 2017-06-10 1 B b2a
#15: 2017-06-14 1 B b2a
#16: 2017-06-19 1 B b3
#17: 2017-06-23 1 B b3
#18: 2017-06-28 1 B b3
#19: 2017-07-03 1 B b2b
#20: 2017-07-08 1 B b2b
#21: 2017-07-14 1 B b2b
#</cont>

关于r - 如何根据数据的整体顺序更改特定的分类变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48444598/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com