python regex lookbehind删除字符串中的_sublabel1，如 "__label__label1

python regex lookbehind删除字符串中的_sublabel1，如 "labellabel1_sublabel1"

转载作者：行者123 更新时间：2023-12-05 03:16:19

27

4

我有准备在 fasttext 中训练的数据集，我想从数据集中删除子标签例如:

__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som 数据。

非常感谢任何帮助谢谢

我试过这个:

r'(?<=__label__[^_]+)\w+'

不工作确切代码:

ptrn = r'(?<=__label__[^_]+)\w+'

re.sub(ptrn, '', test_String)

出现了这个错误:错误:

error Traceback (most recent calllast)c:\Users\THoseini\Desktop\projects\ensani_classification\tes4t.ipynbCell 3 in <cell line: 3>()1 ptrn = r'(?<=label[^_]+)\w+'----> 3 re.sub(ptrn, '', test_String)

Filec:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\re.py:209,in sub(pattern, repl, string, count, flags)202 def sub(pattern, repl, string, count=0, flags=0):203 """Return the string obtained by replacing the leftmost204 non-overlapping occurrences of the pattern in string by the205 replacement repl. repl can be either a string or a callable;206 if a string, backslash escapes in it are processed. If it is207 a callable, it's passed the Match object and must return208 a replacement string to be used."""--> 209 return _compile(pattern, flags).sub(repl, string, count)

Filec:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\re.py:303,in _compile(pattern, flags)301 if not sre_compile.isstring(pattern):302 raise TypeError("first argument must be string or compiled pattern")--> 303 p = sre_compile.compile(pattern, flags)304 if not (flags & DEBUG):305 if len(_cache) >= _MAXCACHE:306 # Drop the oldest item

Filec:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\sre_compile.py:792,in compile(p, flags)--> 198 raise error("look-behind requires fixed-width pattern")199 emit(lo) # look behind200 _compile(code, av[1], flags)

error: look-behind requires fixed-width pattern

最佳答案

试试这个正则表达式:

(__label__[^_\s]+)\w*

after \w star instead of plus to avoid remove whole next label when label doesn't have sublabel

和 python 中的示例代码:

import re
test_string = """__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data."""

ptrn = r'(__label__[^_\s]+)\w*'
re.sub(ptrn, r'\1', test_string)

re.sub() 函数代表一个子字符串，并返回一个带有替换值的字符串。[^character_group]表示否定:匹配任何不在character_group中的单个字符。 \w 匹配任何单词字符。 \s 匹配任何空白字符。

输出如预期:

__label__label1 __label__label2 __label__label __label__label1 sometext some sentce som data.

关于python regex lookbehind删除字符串中的_sublabel1，如 "__label__label1_sublabel1"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74714369/

27

4

0

文章推荐： c - realloc() C语言改变int数组中的值

文章推荐： nim-lang - 如何启用 --threads :on only for files using it?

文章推荐： r - 控制 map 图例中的值范围

javascript - Lookbehind 替代方案，同时具有 lookbehind 和 lookahead
我正在寻找一个正则表达式来在 : 字符上拆分用户提供的字符串，但当用户转义冒号 \: 或者它是 url 的一部分时则不会，例如https://stackoverflow..。在 javascript
java - regex negative lookbehind 正在执行正常的 lookbehind 或给出错误
我试图让正则表达式以负向后视的方式捕获一些数据，这样如果某个字符串在它前面，它就不会匹配。我知道有两种基本格式，但都不起作用。我在搜索应用程序中执行此操作，无法使用 java 进行扩充，因此解决方案
Java regex lookbehind 不能像 js regex lookbehind 那样工作
我有这个目标: 给定字符串:"Part1-part2-part3-part4-part5" 在第二次出现“-”时拆分它，所以我期待一个数组 [ "Part1-part2", "part3-part4
JavaScript 正则表达式 : Positive lookbehind alternative (for Safari and other browsers that do not support lookbehinds)
我正在为此寻找替代方案: (?x[1]); 编辑: 要使用您当前的规范从 str.123 中删除 3，请使用相同的捕获方法:捕获您需要的内容并恢复使用替换模式中的 $n 反向引用在结果中捕获的文本，
JavaScript 正则表达式 : Positive lookbehind alternative (for Safari and other browsers that do not support lookbehinds)
我正在为此寻找替代方案: (?x[1]); 编辑: 要使用您当前的规范从 str.123 中删除 3，请使用相同的捕获方法:捕获您需要的内容并恢复使用替换模式中的 $n 反向引用在结果中捕获的文本，
javascript - Patten 匹配使用 lookbehind assertion 完成。如何在 javascript 中不使用 lookbehind 来做到这一点？
这个问题在这里已经有了答案: Javascript Regex Lookbehind Alternative (2 个答案) 关闭 4 年前。我正在匹配下面描述的三种模式，它们都是独立的。按照超链
python - "use\G in negative variable-length lookbehinds to limit how far back the lookbehind goes"示例
在令人敬畏的正则表达式模块 (https://pypi.python.org/pypi/regex) 的 pypi 页面中指出\G 可以“在负的可变长度后视中使用以限制后视的距离”。非常有趣，但该页面
R lookbehind 断言中的正则表达式
我正在尝试使用 tidyr 中的 extract 函数进行一些模式匹配。我已经在正则表达式练习网站上测试了我的正则表达式，该模式似乎有效，而且我正在使用 lookbehind assertion。我
Regex lookbehind - 从搜索中排除单词
我需要在我的语料库中搜索诸如game 或shame 之类的词，但我想指定搜索以排除三个字符串 a game/a shame or , A game/A shame and a/an/A/An WORD
正则表达式 : negative lookbehind
我试图在公式中替换所有缺少前面零的 float 。例如: “4+.5”应该变成:“4+0.5” 现在我读到 JavaScript 不支持向后看，那么我该如何实现呢？当前面有数字时，以下代码也会替换:
Javascript Lookbehind 与全局搜索重叠
Javascript 中的 lookbehind regexps 有几种(有时是棘手的)解决方案。但是，这是最简单的方法，如果我需要一个零宽度! 使用全局搜索查看表达式，这可能会重叠。例如。使用 /(
javascript - 如何在Javascript中实现正则表达式的正 "lookbehind"
更新的问题假设字符串 "?foo=bar&nonfoo=bar&foo=bar" ，在这种情况下我需要捕获: foo=bar foo=foo 我用 Perl 做的, see here . 但是Jav
javascript - 如何将此正则表达式转换为不使用 Lookbehind？
我需要捕获以 # 开头的所有行JavaScript 中的字符。我尝试使用类似以下正则表达式的内容，但事实证明 JavaScript 不支持积极的后向断言 (?<=) . /(?<=\n)\#[^\n]
javascript - 如何使用否定前瞻(NOT lookbehind)来匹配在特定位置不包含给定子字符串的字符串？
我想将某些文件类型(例如“.txt”)与不以特定子字符串结尾的非空根名称(例如“-bad”)匹配。有了负后视支持，解决方案很简单: /.(? ((regex.test(input) === expec
c# - 正则表达式 Lookbehind 无法按预期工作
我在 .net 中有一个字符串。 Para 1Para 2Para 3Para 4 现在，我只想获取标签 p 内的文本(Para 1、Para 2、Para 3、Para4)。我使用了以下正则表达式
python - 带有 Lookbehind 的正则表达式拆分丢失了下半部分
我有一个包含多个关键字的字符串。我想将字符串拆分为这些关键字的列表(但保留关键字，因为它们确定了以下数据的含义) 以下面的字符串为例: test_string = "ªttypmp3pfilfDjTu
.NET 正则表达式 Lookbehind 不贪婪
如何让lookbehind变得贪婪？在这种情况下，我希望后向处理消耗 : if is 存在。 m = Regex.Match("From: John", @"(?i)(?<=from:)....")
Javascript 正则表达式 Lookbehind 替代方案
我想捕捉不带空格的单词。 var paragraphy="Apple banana kişiler ki örnek foo."; var word="kişiler"; var regex = ne
java - Java 中的正则表达式反向 LookBehind
我正在尝试匹配以 .xsd 结尾但不以 form.xsd 结尾的字符串列表，我使用以下正则表达式: ArrayList files = new ArrayList(); files.add("/aba
java - 我们可以在 lookbehind 表达式中使用量词吗？
此问题特定于 Java 7/8。使用量词的相当复杂的正则表达式在像这样的后向断言中是被禁止的: (?<=(a|b*)*)bc 因为它会导致运行时异常并显示如下消息: look-behind grou

首页

博学

6Ren·AI

商城

python regex lookbehind删除字符串中的_sublabel1，如 "labellabel1_sublabel1"