R函数不循环遍历列但重复第一行结果-6ren

R函数不循环遍历列但重复第一行结果

转载作者：行者123 更新时间：2023-12-04 20:02:55

我正在尝试使用语料库包词干小插图中建议的词干提取功能 https://cran.r-project.org/web/packages/corpus/vignettes/stemmer.html

但是当我尝试在整个列上运行该函数时，它似乎只是在向其余行重复第一行的结果。我猜这与以下函数中的 [[1]] 有关。我猜解决方案类似于“for i in x”，但我对编写函数不够熟悉，不知道如何解决这个问题。

df <- data.frame(x = 1:7, y= c("love", "lover", "lovely", "base", "snoop", "dawg", "pound"), stringsAsFactors=FALSE)

stem_hunspell <- function(term) {
    # look up the term in the dictionary
    stems <- hunspell::hunspell_stem(term)[[1]]

    if (length(stems) == 0) { # if there are no stems, use the original term
        stem <- term
    } else { # if there are multiple stems, use the last one
        stem <- stems[[length(stems)]]
    }

    stem
}

df[3] <- stem_hunspell(df$y)

最佳答案

你的直觉是对的。

hunspell_stem(term) 返回长度为 length(term) 的字符向量的列表。

向量似乎有这个词，但前提是它在字典中作为第一个元素找到，如果它还不是词干，则词干作为第二个元素。

> hunspell::hunspell_stem(df$y)
[[1]]
[1] "love"

[[2]]
[1] "lover" "love" 

[[3]]
[1] "lovely" "love"  

[[4]]
[1] "base"

[[5]]
[1] "snoop"

[[6]]
character(0)

[[7]]
[1] "pound"

下面的函数返回词干或原始术语

stem_hunspell <- function(term) {
  stems <- hunspell::hunspell_stem(term)
  output <- character(length(term))

  for (i in seq_along(term)) {
    stem <- stems[[i]]
    if (length(stem) == 0) {
      output[i] <- term[i]
    } else {
      output[i] <- stem[length(stem)]
    }
  }
  return(output)
}

如果你不想返回 dawg 函数会变得更简单:

stem_hunspell <- function(term) {
  stems <- hunspell::hunspell_stem(term)
  output <- character(length(term))

  for (i in seq_along(term)) {
    stem <- stems[[i]]
    if (length(stem) > 0) {
      output[i] <- stem[length(stem)]
    }
  }
  return(output)
}

关于R函数不循环遍历列但重复第一行结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58737930/