gpt4 book ai didi

r - 如何在r中从类别转换为数字

转载 作者:行者123 更新时间:2023-12-05 01:05:56 25 4
gpt4 key购买 nike

这是我的问题:

我有一个包含类别的表格,我想对它们进行排名:

category
dog
cat
fish
dog
dog

我想要的是添加一列并对它们进行排名:
category       rank    
dog 1
cat 2
fish 3
dog 1
dog 1
  • 很抱歉这个糟糕的表(帮助在堆栈溢出中编写普通表也很棒)
  • 关于如何添加排名列的任何想法?

  • 谢谢!

    最佳答案

    只是为了完整起见,并且因为我在评论中发布的解决方案效率低下(而且非常丑陋),我也会发布一个答案。

    原来,OP的起始设置是这样的:

    x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
    x = factor(x)

    最后,手动指定的数字分类 x被通缉。例如,假设需要以下匹配:
    cat -> 1, dog -> 2, fish -> 3, catfish -> 4

    所以,一些替代方案:
    sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, 
    USE.NAMES = F)
    #[1] 1 2 3 2 2 1 3 4

    match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match'
    #calls 'match_transform' that coerces
    #`factor` to `character`, so no need
    #for 'as.character(x)'
    #(http://svn.r-project.org/R/trunk/src/main/unique.c)
    #[1] 1 2 3 2 2 1 3 4

    local({ #just to not change 'x'
    levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
    as.numeric(x)
    })
    #[1] 1 2 3 2 2 1 3 4

    library(fastmatch)
    fmatch(x, c("cat", "dog", "fish", "catfish")) #a faster alternative to 'match'
    #[1] 1 2 3 2 2 1 3 4

    并在更大的向量上进行基准测试:
    X = rep(as.character(x), 1e5)
    X = factor(X)
    f1 = function() sapply(as.character(X), switch,
    "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
    f2 = function() match(X, c("cat", "dog", "fish", "catfish"))
    f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
    as.numeric(X)}
    library(fastmatch)
    f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))

    library(microbenchmark)
    microbenchmark(f1(), f2(), f3(), f4(), times = 10)
    #Unit: milliseconds
    # expr min lq median uq max neval
    # f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291 10
    # f2() 22.043657 22.786647 23.987263 31.45057 111.9600 10
    # f3() 32.704779 32.919150 38.865853 47.67281 134.2988 10
    # f4() 8.814958 8.823309 9.856188 19.66435 104.2827 10
    sum(f1() != f2())
    #[1] 0
    sum(f2() != f3())
    #[1] 0
    sum(f3() != f4())
    #[1] 0

    关于r - 如何在r中从类别转换为数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20782583/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com