gpt4 book ai didi

r - 字符串中的字符频率

转载 作者:行者123 更新时间:2023-12-02 00:46:27 27 4
gpt4 key购买 nike

我想创建一个带有两个参数的函数,用于显示给定单词中某个字符的出现频率:x <- 单词,y <- 字母。因此,我创建了以下函数:

frequency <- function(x,y)
{
word <- strsplit(x,"")
counter <- 0
for (i in 1:length(word)){
if (word[i] == y) counter=counter+1
}
print(counter)
}

这个函数的基本思想是拆分给定单词的字符,迭代它们并在满足条件时增加计数器的值。然而,此函数始终返回值 0。这是什么原因?

最佳答案

另一个版本是将您的“单词”翻译成 raw() 向量,并与作为 raw() 向量的“字母”进行比较。

frequency = function(word, letter)
sum(charToRaw(word) == charToRaw(letter))

这里有四种不同的解决方案

f0 <- function(word, letter)
sum(strsplit(word, "")[[1]] == letter)

f1 <- function(word, letter)
sum(charToRaw(word) == charToRaw(letter))

f2a <- function(word, letter)
length(unlist(gregexpr(letter, word)))

f2b <- function(word, letter)
length(unlist(gregexpr(letter, word, fixed=TRUE)))

有一些正确性和性能比较

> word <- "foo"
> letter <- "o"
> identical(f0(word, letter), f1(word, letter))
[1] TRUE
> identical(f0(word, letter), f2a(word, letter))
[1] TRUE
> identical(f0(word, letter), f2b(word, letter))
[1] TRUE
> letter <- "a"
> identical(f0(word, letter), f1(word, letter))
[1] TRUE
> identical(f0(word, letter), f2a(word, letter))
[1] FALSE
> identical(f0(word, letter), f2b(word, letter))
[1] FALSE
> word <- paste(sample(letters, 10000, TRUE), collapse="")
> letter <- "a"
> microbenchmark(
+ f0(word, letter), f1(word, letter),
+ f2a(word, letter), f2b(word, letter)
+ )
Unit: microseconds
expr min lq mean median uq max neval
f0(word, letter) 558.433 562.4755 579.03451 583.5590 584.8920 628.946 100
f1(word, letter) 71.482 78.7100 100.85787 80.0275 81.7035 2195.366 100
f2a(word, letter) 277.618 278.7280 280.94280 279.4870 280.4270 302.683 100
f2b(word, letter) 66.888 68.1800 69.07205 68.6205 69.3100 84.300 100

f2b() 是最快的,但也不正确; f1() 目前看来既快又正确(尽管速度对于手头的任务可能并不重要)。

关于r - 字符串中的字符频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43412880/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com