gpt4 book ai didi

r - 对向量中的重复项进行排序和评估

转载 作者:行者123 更新时间:2023-12-04 11:01:25 25 4
gpt4 key购买 nike

我正在尝试创建一个变量,用于标识向量中的字符串是第一次出现、前三名还是超过三个。例如:

在下面的数据集中,我有姓名(会有更多姓名)、文本和一个 dup 变量。我希望 dup 变量能够识别文本是第一次出现(来源),它是否在前三个出现(FirstThree)之内,或者它出现的次数是否超过三次(MoreThanThree)。我还需要为每个人做这件事……但我想我可以弄清楚那部分。在此先感谢您的帮助!

name =c("T","T","T","T","T","T","T","T","T","T")
text =c("a","b","a","a","b","c","a","a","b","a")
dup =c("origin","origin","FirstThree","FirstThree","FirstThree","origin","MoreThanThree","MoreThanThree","FirstThree","MoreThanThree")
dfA = data.frame(name,text,dup)

name text dup
1 T a origin
2 T b origin
3 T a FirstThree
4 T a FirstThree
5 T b FirstThree
6 T c origin
7 T a MoreThenThree
8 T a MoreThenThree
9 T b FirstThree
10 T a MoreThenThree

最佳答案

您可以使用 data.table::rowid带两个 ifelse检查

dfA[, ict := {
r <- rowid(text)
ifelse(r == 1, 'origin',
ifelse(r <= 3, 'FirstThree',
'MoreThanThree'))}
, by = name]

dfA
# name text dup ict
# 1: T a origin origin
# 2: T b origin origin
# 3: T a FirstThree FirstThree
# 4: T a FirstThree FirstThree
# 5: T b FirstThree FirstThree
# 6: T c origin origin
# 7: T a MoreThanThree MoreThanThree
# 8: T a MoreThanThree MoreThanThree
# 9: T b FirstThree FirstThree
# 10: T a MoreThanThree MoreThanThree

您也可以使用 cut .唯一的区别是这产生了一个因素而不是性格。如果您有 3 个以上的类别,可能会有用
dfA[, ict := cut(rowid(text), c(0, 1, 3, Inf), 
labels = c('origin', 'FirstThree', 'MoreThanThree'))
, by = name]

关于r - 对向量中的重复项进行排序和评估,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58773604/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com