gpt4 book ai didi

python regex lookbehind删除字符串中的_sublabel1,如 "__label__label1_sublabel1"

转载 作者:行者123 更新时间:2023-12-05 03:16:19 27 4
gpt4 key购买 nike

我有准备在 fasttext 中训练的数据集,我想从数据集中删除子标签例如:

__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som 数据。

非常感谢任何帮助谢谢

我试过这个:

r'(?<=__label__[^_]+)\w+'

不工作确切代码:

ptrn = r'(?<=__label__[^_]+)\w+'

re.sub(ptrn, '', test_String)

出现了这个错误:错误:

error Traceback (most recent calllast)c:\Users\THoseini\Desktop\projects\ensani_classification\tes4t.ipynbCell 3 in <cell line: 3>()1 ptrn = r'(?<=label[^_]+)\w+'----> 3 re.sub(ptrn, '', test_String)

Filec:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\re.py:209,in sub(pattern, repl, string, count, flags)202 def sub(pattern, repl, string, count=0, flags=0):203 """Return the string obtained by replacing the leftmost204 non-overlapping occurrences of the pattern in string by the205 replacement repl. repl can be either a string or a callable;206 if a string, backslash escapes in it are processed. If it is207 a callable, it's passed the Match object and must return208 a replacement string to be used."""--> 209 return _compile(pattern, flags).sub(repl, string, count)

Filec:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\re.py:303,in _compile(pattern, flags)301 if not sre_compile.isstring(pattern):302 raise TypeError("first argument must be string or compiled pattern")--> 303 p = sre_compile.compile(pattern, flags)304 if not (flags & DEBUG):305 if len(_cache) >= _MAXCACHE:306 # Drop the oldest item

Filec:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\sre_compile.py:792,in compile(p, flags)--> 198 raise error("look-behind requires fixed-width pattern")199 emit(lo) # look behind200 _compile(code, av[1], flags)

error: look-behind requires fixed-width pattern

最佳答案

试试这个正则表达式:

(__label__[^_\s]+)\w*

after \w star instead of plus to avoid remove whole next label when label doesn't have sublabel

和 python 中的示例代码:

import re
test_string = """__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data."""

ptrn = r'(__label__[^_\s]+)\w*'
re.sub(ptrn, r'\1', test_string)

re.sub() 函数代表一个子字符串,并返回一个带有替换值的字符串。[^character_group]表示否定:匹配任何不在character_group中的单个字符。 \w 匹配任何单词字符。 \s 匹配任何空白字符。

输出如预期:

__label__label1 __label__label2 __label__label __label__label1 sometext some sentce som data.

关于python regex lookbehind删除字符串中的_sublabel1,如 "__label__label1_sublabel1",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74714369/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com