"时,"[[:punct:]]"与 `stringr::str_replace_all` 不匹配?-6ren"> "时,"[[:punct:]]"与 `stringr::str_replace_all` 不匹配?-这个问题在这里已经有了答案: R/regex with stringi/ICU: why is a '+' considered a non-[:punct:] character? (2 个回答) -6ren">
gpt4 book ai didi

r - 使用 ">"时,"[[:punct:]]"与 `stringr::str_replace_all` 不匹配?

转载 作者:行者123 更新时间:2023-12-04 11:47:03 24 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





R/regex with stringi/ICU: why is a '+' considered a non-[:punct:] character?

(2 个回答)


2年前关闭。




我觉得这很奇怪:

pattern <- "[[:punct:][:digit:][:space:]]+"
string <- "a . , > 1 b"

gsub(pattern, " ", string)
# [1] "a b"

library(stringr)
str_replace_all(string, pattern, " ")
# [1] "a > b"

str_replace_all(string, "[[:punct:][:digit:][:space:]>]+", " ")
# [1] "a b"

这是预期的吗?

最佳答案

仍在为此努力,但 ?"stringi-search-charclass"说:

Beware of using POSIX character classes, e.g. ‘[:punct:]’. ICU User Guide (see below) states that in general they are not well-defined, so may end up with something different than you expect.

In particular, in POSIX-like regex engines, ‘[:punct:]’ stands for the character class corresponding to the ‘ispunct()’ classification function (check out ‘man 3 ispunct’ on UNIX-like systems). According to ISO/IEC 9899:1990 (ISO C90), the ‘ispunct()’ function tests for any printing character except for space or a character for which ‘isalnum()’ is true. However, in a POSIX setting, the details of what characters belong into which class depend on the current locale. So the ‘[:punct:]’ class does not lead to portable code (again, in POSIX-like regex engines).

So a POSIX flavor of ‘[:punct:]’ is more like ‘[\p{P}\p{S}]’ in ‘ICU’. You have been warned.



从上面发布的问题中复制,
string  <- "a . , > 1 b"
mypunct <- "[[\\p{P}][\\p{S}]]"
stringr::str_remove_all(string, mypunct)

我可以欣赏特定于语言环境的内容,但仍然让我感到惊讶 [:punct:]甚至不能在 C 语言环境中工作......

关于r - 使用 ">"时,"[[:punct:]]"与 `stringr::str_replace_all` 不匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53119840/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com