gpt4 book ai didi

python - 在 Python 3 中使用带字节的 textwrap.dedent()

转载 作者:行者123 更新时间:2023-11-28 21:45:09 29 4
gpt4 key购买 nike

当我在 Python 中使用三引号多行字符串时,我倾向于使用 textwrap.dedent 来保持代码的可读性和良好的缩进:

some_string = textwrap.dedent("""
First line
Second line
...
""").strip()

但是,在 Python 3.x 中,textwrap.dedent 似乎不适用于字节字符串。我在为返回长多行字节字符串的方法编写单元测试时遇到了这个问题,例如:

# The function to be tested

def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'

# Unit test

import unittest
import textwrap

class SomeTest(unittest.TestCase):
def test_some_function(self):
self.assertEqual(some_function(), textwrap.dedent(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""").strip())

if __name__ == '__main__':
unittest.main()

在 Python 2.7.10 中上面的代码工作正常,但在 Python 3.4.3 中它失败了:

E
======================================================================
ERROR: test_some_function (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 16, in test_some_function
""").strip())
File "/usr/lib64/python3.4/textwrap.py", line 416, in dedent
text = _whitespace_only_re.sub('', text)
TypeError: can't use a string pattern on a bytes-like object

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

那么:是否有可用于字节字符串的 textwrap.dedent 替代方案?

  • 我可以自己编写这样的函数,但如果有现成的函数,我更愿意使用它。
  • 我可以转换为 unicode,使用 textwrap.dedent,然后转换回字节。但这只有在字节串符合某些 Unicode 编码的情况下才可行。

最佳答案

答案 2:textwrap 主要是关于 Textwrap 类和函数。 dedent 列在

# -- Loosely related functionality --------------------

据我所知,唯一使其成为文本 (unicode str) 的特定内容是 re 文字。我用 b 为所有 6 个前缀,瞧! (我没有编辑其他任何东西,但应该调整函数文档字符串。)

import re

_whitespace_only_re = re.compile(b'^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile(b'(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)

def dedent_bytes(text):
"""Remove any common leading whitespace from every line in `text`.

This can be used to make triple-quoted strings line up with the left
edge of the display, while still presenting them in the source code
in indented form.

Note that tabs and spaces are both treated as whitespace, but they
are not equal: the lines " hello" and "\\thello" are
considered to have no common leading whitespace. (This behaviour is
new in Python 2.5; older versions of this module incorrectly
expanded tabs before searching for common leading whitespace.)
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub(b'', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent

# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass

# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent

# Find the largest common whitespace between current line
# and previous winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
else:
margin = margin[:len(indent)]

# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split(b"\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)

if margin:
text = re.sub(rb'(?m)^' + margin, b'', text)
return text

print(dedent_bytes(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")
)

# prints
b'\nLorem ipsum dolor sit amet\n consectetuer adipiscing elit\n'

关于python - 在 Python 3 中使用带字节的 textwrap.dedent(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39822598/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com