gpt4 book ai didi

regex - R:使用strsplit和perl REGEX语法提取大写字母和特殊字符

转载 作者:行者123 更新时间:2023-12-04 22:09:18 25 4
gpt4 key购买 nike

如何仅提取具有以下大写字母的/和整个[[:punct:]]/$[[:punct:]]

text <- c("This/ART ,/$; Is/NN something something/else A/VAFIN faulty/ADV text/ADV which/ADJD i/PWS propose/ADV as/APPR Example/NE ./$. So/NE It/PTKNEG makes/ADJD no/VAFIN sense/ADV at/KOUS all/PDAT ,/$, it/APPR Has/ADJA Errors/NN  ,/$; and/APPR it/APPR is/CARD senseless/NN again/ART ./$:")

# HOW to?
textPOS <- strsplit(text,"( )|(?<=[[:punct:]]/\\$[[:punct:]])", perl=TRUE)
# ^^^
# extract only the "/" with the following capital letters
# and the whole "[[:punct:]]/$[[:punct:]]"

# Expected RETURN:
> textPOS
[1] "/ART" ",/$;" "/NN" "/VAFIN" "/ADV" "/ADV" "/ADJD" "/PWS" "/ADV" "/APPR" "/NE" "./$." "/NE" "/PTKNEG" "/ADJD" "/VAFIN" "/ADV" "/KOUS" "/PDAT" ",/$," "/APPR" "/ADJA" "/NN" ",/$;" "/APPR" "/APPR" "/CARD" "/NN" "/ART" "./$:"


谢谢! :)

最佳答案

您可以使用gregexprregmatches

regmatches(text, gregexpr('[[:punct:]]*/[[:alpha:][:punct:]]*', text))
# [[1]]
# [1] "/ART" "/NN" "/VAFIN" "/ADV" "/ADV" "/ADJD" "/PWS" "/ADV" "/APPR" "/NE" "./$." "/NE"
# [13] "/PTKNEG" "/ADJD" "/VAFIN" "/ADV" "/KOUS" "/PDAT" ",/$," "/APPR" "/ADJA" "/NN" ",/$;" "/APPR"
# [25] "/APPR" "/CARD" "/NN" "/ART" "./$:"


正则表达式用词表示:“查找以零或多个标点符号开头,后跟一个斜杠,然后是一个或多个字母或标点符号的事物。如果要包含数字,请切换到 [:alnum:]



根据注释,如果只需要大写字母,则正则表达式将变为:

regmatches(text, gregexpr('[[:punct:]]*/[[:upper:][:punct:]]*', text))


正如@eddi所建议的, [A-Z][:upper:]大致相等。再次如@eddi所建议的,此正则表达式将捕获/ LETTERS以及/ $ punct的情况:

/[A-Z]+|[[:punct:]]/\\$[[:punct:]]

关于regex - R:使用strsplit和perl REGEX语法提取大写字母和特殊字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18788726/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com