gpt4 book ai didi

Python 正则表达式 - r 前缀

转载 作者:IT老高 更新时间:2023-10-28 21:43:40 25 4
gpt4 key购买 nike

如果不使用 r 前缀,谁能解释为什么下面的示例 1 有效?我认为在使用转义序列时必须使用 r 前缀。示例 2 和示例 3 证明了这一点。

# example 1
import re
print (re.sub('\s+', ' ', 'hello there there'))
# prints 'hello there there' - not expected as r prefix is not used

# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
# prints 'hello there' - as expected as r prefix is used

# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))
# prints 'hello there there' - as expected as r prefix is not used

最佳答案

因为 \ 只有在它们是有效的转义序列时才开始转义序列。

>>> '\n'
'\n'
>>> r'\n'
'\\n'
>>> print '\n'


>>> print r'\n'
\n
>>> '\s'
'\\s'
>>> r'\s'
'\\s'
>>> print '\s'
\s
>>> print r'\s'
\s

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape Sequence   Meaning Notes
\newline Ignored
\\ Backslash (\)
\' Single quote (')
\" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\N{name} Character named name in the Unicode database (Unicode only)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\uxxxx Character with 16-bit hex value xxxx (Unicode only)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only)
\v ASCII Vertical Tab (VT)
\ooo Character with octal value ooo
\xhh Character with hex value hh

永远不要依赖原始字符串作为路径文字,因为原始字符串有一些相当特殊的内部工作原理,众所周知,它们会咬人的屁股:

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

为了更好地说明最后一点:

>>> r'\'
SyntaxError: EOL while scanning string literal
>>> r'\''
"\\'"
>>> '\'
SyntaxError: EOL while scanning string literal
>>> '\''
"'"
>>>
>>> r'\\'
'\\\\'
>>> '\\'
'\\'
>>> print r'\\'
\\
>>> print r'\'
SyntaxError: EOL while scanning string literal
>>> print '\\'
\

关于Python 正则表达式 - r 前缀,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2241600/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com