gpt4 book ai didi

regex - 用于重新排序字段中字符串的正则表达式

转载 作者:行者123 更新时间:2023-12-03 20:59:50 27 4
gpt4 key购买 nike

我正在尝试用正则表达式编写一个程序来清理一些数据。假设我有带有字母和数字的房间名称。在最终输出中,我需要使用模式“完整字符串(不包括字母和数字)+ 字母 + 数字”输出房间名称,如下例所示。然而,使用我目前编写的正则表达式,我得到的结果非常困惑,它们位于我的消息底部。出于某种原因,即使输入数据中可能没有字母和字符,它也会在某些行上放置字母和字符。谢谢你。

编辑:我对输入数据进行了编辑。我想将代码概括为采用任意数量的字符串,而不仅仅是单个单词“ROOM”。

# the pattern should be "the full string (excluding letter & number) + letter + number". For example:
ATLANTA ROOM
ATLANTA ROOM 3
NEW YORK ROOM A 2
ROOM A 4
THE BIG AWESOME ROOM B
ROOM B 4
GEORGETOWN ROOM B 2
NEW YORK ROOM C 2
NEW YORK ROOM C
LOS ANGELES ROOM E 2

# program to clean with regular expressions. there could be multiple spaces between words
dd <- c("ATLANTA ROOM ",
" ATLANTA ROOM 3",
"NEW YORK A ROOM 2",
"4 ROOM A",
"THE BIG AWESOME ROOM B",
" ROOM 4 B",
"GEORGETOWN B 2 ROOM ",
" C NEW YORK ROOM 2",
"NEW YORK ROOM C",
"LOS ANGELES ROOM 2 E")

m_char_num <- regexpr("(\\<A|B|C|D|E|1|2|3|4\\>)", dd)
m_char <- regexpr("(\\<A|B|C|D|E\\>)", dd)
m_num <- regexpr("(\\<1|2|3|4\\>)", dd)

(dd2 <- paste(gsub("( +)", " ",
gsub("(^ +)|( +$)", "",
gsub("(\\<A|B|C|D|E|1|2|3|4\\>)", "", dd))),
regmatches(dd, m_char), regmatches(dd, m_num), sep = " "))

# actual output from the program
"TLANTA ROOMA3",
"TLANTA ROOMA2",
"NW YORK ROOMA4",
"ROOMA4",
"TH IG WSOM ROOME2",
"ROOMB2",
"GORGTOWN ROOMB2",
"NW YORK ROOMC3",
"NW YORK ROOMC2",
"LOS NGLS ROOMA4"

最佳答案

这是一个尝试:

sub(' $', '', # clean up spaces at the end
gsub(' +', ' ', # clean up double spaces
# rearrange letter and numbers
sub('^([A-Z]?)([0-9]*)([A-Z]?)$', 'ROOM \\1\\3 \\2',
gsub(' |ROOM', '', dd) # remove spaces and ROOM
)
)
)
#[1] "ROOM" "ROOM 3" "ROOM A 2" "ROOM A 4" "ROOM B" "ROOM B 4" "ROOM B 2"
#[8] "ROOM C 2" "ROOM C" "ROOM E 2"

下面是编辑过的 OP 和评论的相同逻辑(假设房间名称是至少有 3 个字母和最多 2 个字母的房间名称的单词):
gsub('(^ | $)', '', # clean up spaces in front or end
gsub(' +', ' ', # clean up double spaces
# extract room name and put it in front of the letter and number
paste(gsub('\\b([A-Z][A-Z]?|[0-9]+)\\b', '', dd, perl = T),
sub('^([A-Z]+)?([0-9]*)([A-Z]+)?$', '\\1\\3 \\2',
gsub(' |\\w\\w\\w+', '', dd) # remove spaces and words
)
)
)
)

关于regex - 用于重新排序字段中字符串的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19008558/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com