gpt4 book ai didi

r - 从文本中提取 "((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun"(Justeson & Katz, 1995)

转载 作者:行者123 更新时间:2023-12-04 06:38:07 27 4
gpt4 key购买 nike

请问是否可以提取 ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun Justeson 和 Katz (1995) 在 R 包 openNLP 中提出的?

也就是说,我想使用这种语言过滤来提取候选名词短语。

我不能很好地理解它的含义。

你能帮我解释一下或将这种表示形式转换为 R 语言吗?

非常感谢。

也许我们可以从以下位置开始示例代码:

library("openNLP")  

acq <- "This paper describes a novel optical thread plug
gauge (OTPG) for internal thread inspection using machine
vision. The OTPG is composed of a rigid industrial
endoscope, a charge-coupled device camera, and a two
degree-of-freedom motion control unit. A sequence of
partial wall images of an internal thread are retrieved and
reconstructed into a 2D unwrapped image. Then, a digital
image processing and classification procedure is used to
normalize, segment, and determine the quality of the
internal thread."

acqTag <- tagPOS(acq)

acqTagSplit = strsplit(acqTag," ")

我被告知要为此提出一个新问题。原问题是 here .

最佳答案

通过以下方式安装软件包:

install.packages("openNLP")
install.packages("openNLPmodels.en")

之后,你可以运行上面的代码。它将对文本中的所有单词进行 POS 标记,并用所有标记为名词、动词等的单词返回原始文本。我这个例子如下:
acqTagSplit = strsplit(acqTag," ")
> acqTag
[1] "This/DT paper/NN describes/VBZ a/DT novel/NN optical/JJ thread/NN plug/NN gauge/NN (OTPG)/NN for/IN internal/JJ thread/NN inspection/NN using/VBG machine/NN vision./NN The/DT OTPG/NNP is/VBZ composed/VBN of/IN a/DT rigid/JJ industrial/JJ endoscope,/NNS a/DT charge-coupled/JJ device/NN camera,/VBD and/CC a/DT two/CD degree-of-freedom/NN motion/NN control/NN unit./NN A/DT sequence/NN of/IN partial/JJ wall/NN images/NNS of/IN an/DT internal/JJ thread/NN are/VBP retrieved/VBN and/CC reconstructed/VBN into/IN a/DT 2D/JJ unwrapped/JJ image./NN Then,/IN a/DT digital/JJ image/NN processing/NN and/CC classification/NN procedure/NN is/VBZ used/VBN to/TO normalize,/JJ segment,/NN and/CC determine/VB the/DT quality/NN of/IN the/DT internal/JJ thread./NN"

毕竟用破折号分隔的单词,您拥有所有 POS 标签。要将 theese 与单词分开,您可以先将单词分开 - 正如您在示例中所做的那样:
acqTagSplit = strsplit(acqTag," ")
acqTagSplit
[[1]]
[1] "This/DT" "paper/NN" "describes/VBZ"
[4] "a/DT" "novel/NN" "optical/JJ"
[7] "thread/NN" "plug/NN" "gauge/NN"
[10] "(OTPG)/NN" "for/IN" "internal/JJ"
[13] "thread/NN" "inspection/NN" "using/VBG"
[16] "machine/NN" "vision./NN" "The/DT"
[19] "OTPG/NNP" "is/VBZ" "composed/VBN"
[22] "of/IN" "a/DT" "rigid/JJ"
[25] "industrial/JJ" "endoscope,/NNS" "a/DT"
[28] "charge-coupled/JJ" "device/NN" "camera,/VBD"
[31] "and/CC" "a/DT" "two/CD"
[34] "degree-of-freedom/NN" "motion/NN" "control/NN"
[37] "unit./NN" "A/DT" "sequence/NN"
[40] "of/IN" "partial/JJ" "wall/NN"
[43] "images/NNS" "of/IN" "an/DT"
[46] "internal/JJ" "thread/NN" "are/VBP"
[49] "retrieved/VBN" "and/CC" "reconstructed/VBN"
[52] "into/IN" "a/DT" "2D/JJ"
[55] "unwrapped/JJ" "image./NN" "Then,/IN"
[58] "a/DT" "digital/JJ" "image/NN"
[61] "processing/NN" "and/CC" "classification/NN"
[64] "procedure/NN" "is/VBZ" "used/VBN"
[67] "to/TO" "normalize,/JJ" "segment,/NN"
[70] "and/CC" "determine/VB" "the/DT"
[73] "quality/NN" "of/IN" "the/DT"
[76] "internal/JJ" "thread./NN"

然后从 POS 标签中拆分单词:
strsplit(acqTagSplit[[1]], "/")

您将有一个列表,其中包含所有带有标签的单词,并且在里面首先将单词和标签之后分开。看:
str(strsplit(acqTagSplit[[1]], "/"))
List of 77
$ : chr [1:2] "This" "DT"
$ : chr [1:2] "paper" "NN"
$ : chr [1:2] "describes" "VBZ"
$ : chr [1:2] "a" "DT"
$ : chr [1:2] "novel" "NN"
$ : chr [1:2] "optical" "JJ"
$ : chr [1:2] "thread" "NN"
$ : chr [1:2] "plug" "NN"
$ : chr [1:2] "gauge" "NN"
$ : chr [1:2] "(OTPG)" "NN"
$ : chr [1:2] "for" "IN"
$ : chr [1:2] "internal" "JJ"
$ : chr [1:2] "thread" "NN"
$ : chr [1:2] "inspection" "NN"
$ : chr [1:2] "using" "VBG"
$ : chr [1:2] "machine" "NN"
$ : chr [1:2] "vision." "NN"
$ : chr [1:2] "The" "DT"
$ : chr [1:2] "OTPG" "NNP"
$ : chr [1:2] "is" "VBZ"
$ : chr [1:2] "composed" "VBN"
$ : chr [1:2] "of" "IN"
$ : chr [1:2] "a" "DT"
$ : chr [1:2] "rigid" "JJ"
$ : chr [1:2] "industrial" "JJ"
$ : chr [1:2] "endoscope," "NNS"
$ : chr [1:2] "a" "DT"
$ : chr [1:2] "charge-coupled" "JJ"
$ : chr [1:2] "device" "NN"
$ : chr [1:2] "camera," "VBD"
$ : chr [1:2] "and" "CC"
$ : chr [1:2] "a" "DT"
$ : chr [1:2] "two" "CD"
$ : chr [1:2] "degree-of-freedom" "NN"
$ : chr [1:2] "motion" "NN"
$ : chr [1:2] "control" "NN"
$ : chr [1:2] "unit." "NN"
$ : chr [1:2] "A" "DT"
$ : chr [1:2] "sequence" "NN"
$ : chr [1:2] "of" "IN"
$ : chr [1:2] "partial" "JJ"
$ : chr [1:2] "wall" "NN"
$ : chr [1:2] "images" "NNS"
$ : chr [1:2] "of" "IN"
$ : chr [1:2] "an" "DT"
$ : chr [1:2] "internal" "JJ"
$ : chr [1:2] "thread" "NN"
$ : chr [1:2] "are" "VBP"
$ : chr [1:2] "retrieved" "VBN"
$ : chr [1:2] "and" "CC"
$ : chr [1:2] "reconstructed" "VBN"
$ : chr [1:2] "into" "IN"
$ : chr [1:2] "a" "DT"
$ : chr [1:2] "2D" "JJ"
$ : chr [1:2] "unwrapped" "JJ"
$ : chr [1:2] "image." "NN"
$ : chr [1:2] "Then," "IN"
$ : chr [1:2] "a" "DT"
$ : chr [1:2] "digital" "JJ"
$ : chr [1:2] "image" "NN"
$ : chr [1:2] "processing" "NN"
$ : chr [1:2] "and" "CC"
$ : chr [1:2] "classification" "NN"
$ : chr [1:2] "procedure" "NN"
$ : chr [1:2] "is" "VBZ"
$ : chr [1:2] "used" "VBN"
$ : chr [1:2] "to" "TO"
$ : chr [1:2] "normalize," "JJ"
$ : chr [1:2] "segment," "NN"
$ : chr [1:2] "and" "CC"
$ : chr [1:2] "determine" "VB"
$ : chr [1:2] "the" "DT"
$ : chr [1:2] "quality" "NN"
$ : chr [1:2] "of" "IN"
$ : chr [1:2] "the" "DT"
$ : chr [1:2] "internal" "JJ"
$ : chr [1:2] "thread." "NN"

关于r - 从文本中提取 "((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun"(Justeson & Katz, 1995),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4610974/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com