gpt4 book ai didi

r - 如何使用 quanteda 进行命名实体识别 (NER)?

转载 作者:行者123 更新时间:2023-12-01 21:55:50 25 4
gpt4 key购买 nike

有一个带有文本的数据框

df = data.frame(id=c(1,2), text = c("My best friend John works and Google", "However he would like to work at Amazon as he likes to use python and stay at Canada")

没有任何预处理

如何提取像this这样的名称实体识别?

示例结果词

dfresults = data.frame(id=c(1,2), ner_words = c("John, Google", "Amazon, python, Canada")

最佳答案

您可以在没有 quanteda 的情况下执行此操作,使用 spacyr 包——您链接文章中提到的 spaCy 库的包装器。

在这里,我稍微编辑了您的输入 data.frame。

df <- data.frame(id = c(1, 2), 
text = c("My best friend John works at Google.",
"However he would like to work at Amazon as he likes to use Python and stay in Canada."),
stringsAsFactors = FALSE)

然后:

library("spacyr")
library("dplyr")

# -- need to do these before the next function will work:
# spacy_install()
# spacy_download_langmodel(model = "en_core_web_lg")

spacy_initialize(model = "en_core_web_lg")
#> Found 'spacy_condaenv'. spacyr will use this environment
#> successfully initialized (spaCy Version: 2.0.10, language model: en_core_web_lg)
#> (python options: type = "condaenv", value = "spacy_condaenv")

txt <- df$text
names(txt) <- df$id

spacy_parse(txt, lemma = FALSE, entity = TRUE) %>%
entity_extract() %>%
group_by(doc_id) %>%
summarize(ner_words = paste(entity, collapse = ", "))
#> # A tibble: 2 x 2
#> doc_id ner_words
#> <chr> <chr>
#> 1 1 John, Google
#> 2 2 Amazon, Python, Canada

关于r - 如何使用 quanteda 进行命名实体识别 (NER)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57289663/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com