r - R中的环视正则表达式模式-6ren

r - R中的环视正则表达式模式

转载作者：行者123 更新时间：2023-12-03 16:19:59

26

4

我坚持创建正确的正则表达式模式，该模式将拆分我的数据框列的内容，而不会让我失去任何元素。
我必须使用 separate()来自 tidyr 的函数包，因为这是更长的处理管道的一部分。由于我不想丢失字符串中的任何元素，我正在开发一个前瞻/后视表达式。
需要拆分的字符串可以遵循以下模式之一:

只有字母(例如“abcd”)

字母-破折号-数字(例如“abcd-123”)

字母数字(例如“abcd1234”)
列内容应最多分为 3 列，每组一列。

我想在每次元素更改时拆分，所以在字母和破折号之后。可以有一个或多个字母、一个或多个数字，但只能有一个破折号。只包含字母的字符串，不需要拆分。
这是我尝试过的:

library(tidyr) 
myDat = data.frame(drugName = c("ab-1234", 'ab-1234', 'ab-1234',
                                'placebo', 'anotherdrug', 'andanother',
                                'xyz123', 'xyz123', 'placebo', 'another',
                                'omega-3', 'omega-3', 'another', 'placebo'))
drugColNames = paste0("X", 1:3) 

# This pattern doesn't split strings that only consist of number and letters, e.g. "xyz123" is not split after the letters.
pat = '(?=-[0-9+])|(?<=[a-z+]-)'

# This pattern splits at all the right places, but the last group (the numbers), is separated and not kept together.
# pat = '(?=-[0-9+]|[0-9+])|(?<=[a-z+]-)'

splitDat = separate(myDat, drugName,
         into = drugColNames,
         sep = pat)

拆分的输出应该是:

"ab-1234" --> "ab" "-" "123"
"xyz123" --> "xyz" "123"
"omega-3" --> "omega" "-" "3"

非常感谢您在这方面提供帮助。 :)

最佳答案

使用会更容易extract在这里，因为我们没有固定的分隔符，这也将避免使用正则表达式查找。

tidyr::extract(myDat, drugName, drugColNames, '([a-z]+)(-)?(\\d+)?', remove = FALSE)

#      drugName          X1 X2   X3
#1      ab-1234          ab  - 1234
#2      ab-1234          ab  - 1234
#3      ab-1234          ab  - 1234
#4      placebo     placebo        
#5  anotherdrug anotherdrug        
#6   andanother  andanother        
#7       xyz123         xyz     123
#8       xyz123         xyz     123
#9      placebo     placebo        
#10     another     another        
#11     omega-3       omega  -    3
#12     omega-3       omega  -    3
#13     another     another        
#14     placebo     placebo

关于r - R中的环视正则表达式模式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64872872/

26

4

0

文章推荐： ruby-on-rails - RSpec:如何在静态方法上使用 should_receive？

文章推荐： xml - 使用XSLT 1.0对多个属性进行分组

java - 如何使用 DFA 正则表达式匹配器实现正则表达式断言/环视(即\b 样式词边界)
我想在基于 DFA 的正则表达式匹配器中实现“词边界”匹配。谁能告诉我这是怎么做到的？为了提供一些背景知识，我目前正在使用“dk.brics.automaton”库，但它不支持断言(例如 \b，字边

首页

博学

6Ren·AI

商城

r - R中的环视正则表达式模式