gpt4 book ai didi

Index an ordered vector with random number of repeating values [duplicate](为具有随机数量重复值的有序向量编制索引[重复])

转载 作者:bug小助手 更新时间:2023-10-25 22:26:48 24 4
gpt4 key购买 nike




df<-data.frame(old=c(1,1,1,5,7,7,7,11,13,13,16,18,20,20,20,20,25,25,25,29),
new=c(1,1,1,2,3,3,3,4,5,5,6,7,8,8,8,8,9,9,9,10))

old new
1 1 1
2 1 1
3 1 1
4 5 2
5 7 3
6 7 3
7 7 3
8 11 4
9 13 5
10 13 5
11 16 6
12 18 7
13 20 8
14 20 8
15 20 8
16 20 8
17 25 9
18 25 9
19 25 9
20 29 10

How do I turn old into new easily? Basically it's the order of values repeated the same number of times. The values are always increasing.

我怎么才能轻易地把旧的变成新的?基本上,它是重复相同次数的值的顺序。这些值总是在增加。


Thanks in advance. I don't even know how to Google something this simple.

先谢谢你。我甚至不知道怎么用谷歌搜索这么简单的东西。


更多回答
优秀答案推荐


You could use dplyr::consecutive_id or data.table::rleid to get an identifier for each run of identical values:

您可以使用dplyr::Continucative_id或data.table::rleid来获取每次运行相同值的标识符:


df <- data.frame(
old = c(1, 1, 1, 5, 7, 7, 7, 11, 13, 13, 16, 18, 20, 20, 20, 20, 25, 25, 25, 29)
)

library(dplyr, warn.conflicts = FALSE)
library(data.table)

df |>
mutate(new = consecutive_id(old))
#> old new
#> 1 1 1
#> 2 1 1
#> 3 1 1
#> 4 5 2
#> 5 7 3
#> 6 7 3
#> 7 7 3
#> 8 11 4
#> 9 13 5
#> 10 13 5
#> 11 16 6
#> 12 18 7
#> 13 20 8
#> 14 20 8
#> 15 20 8
#> 16 20 8
#> 17 25 9
#> 18 25 9
#> 19 25 9
#> 20 29 10

df |>
mutate(new = rleid(old))
#> old new
#> 1 1 1
#> 2 1 1
#> 3 1 1
#> 4 5 2
#> 5 7 3
#> 6 7 3
#> 7 7 3
#> 8 11 4
#> 9 13 5
#> 10 13 5
#> 11 16 6
#> 12 18 7
#> 13 20 8
#> 14 20 8
#> 15 20 8
#> 16 20 8
#> 17 25 9
#> 18 25 9
#> 19 25 9
#> 20 29 10


You're essentially just checking when the old column changes. You can do this in a variety of ways, but the easiest is checking if the previous value in df$old matches the current one, then find the cumulative sum of that:

本质上,您只是检查旧列何时发生更改。您可以使用多种方法来完成此操作,但最简单的方法是检查df$old中的前一个值是否与当前值匹配,然后求出其累计和:


df$new <- cumsum(df$old != dplyr::lag(df$old, default = 0))

or


df$new <- cumsum(df$old != c(0, df$old[-nrow(df)]))

or


library(dplyr)
df |> mutate(new = cumsum(old != lag(df$old, default = 0)))

or


df$new <- cumsum(diff(c(0, df$old)) > 0)

or


df |> mutate(new = cumsum(diff(c(0, old)) > 0)

更多回答

Thank you! I should have known it was in dplyr but had no idea how to phrase a search for it.

谢谢!我应该知道它在dplyr,但我不知道如何措辞搜索它。

Thanks. Yeah, these all work. I had a clunky ifelse method along the same lines but i figured there must be a specific function that does this, and of course there is in dplyr and also data.table. See the answer from @stefan

谢谢。是的,这些都起作用了。我使用了一个类似的笨重的ifElse方法,但我认为一定有一个特定的函数来完成这项工作,当然dplyr和data.table中也有这个函数。请看@stefan的答案

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com