gpt4 book ai didi

regex - R中的gsub和regex遇到问题

转载 作者:行者123 更新时间:2023-12-04 16:51:34 26 4
gpt4 key购买 nike

我在R中使用gsub将文本添加到字符串的中间。它可以完美工作,但是由于某些原因,当位置太长时会引发错误。代码如下:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)

Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) :  invalid
regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'


当括号中的数字较小(在这种情况下为273)时,此代码运行良好,但当括号中的数字较大时,此代码则无法正常工作。

这将产生错误:
sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy"
gsub('^(.{125})(.+)$', new_cols, sql) #**Works
gsub('^(.{273})(.+)$', new_cols, sql)

Error in gsub("^(.{273})(.+)$", new_cols, sql) :    invalid regular
expression '^(.{273})(.+)$', reason 'Invalid contents of {}'

最佳答案

背景
R gsub默认使用TRE regex库。限制量词中的边界从0到TRE代码中定义的RE_DUP_MAX都是有效的。参见this TRE reference:

A bound is one of the following, where n and m are unsigned decimal integers between 0 and RE_DUP_MAX


似乎 RE_DUP_MAX设置为255(请参见显示 #define RE_DUP_MAX 255的此 TRE source file),因此,您不能在 {n,m}限制量词中使用更多内容。
解决方案
使用PCRE regex风格,添加 perl = TRUE,它将起作用。
R demo:
> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"

关于regex - R中的gsub和regex遇到问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37323968/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com