gpt4 book ai didi

Python Yaml parse inf as float(Python YAML将inf解析为浮点型)

转载 作者:bug小助手 更新时间:2023-10-25 21:55:04 25 4
gpt4 key购买 nike



In PyYaml or ruamel.yaml I'm wondering if there is a way to handle parsing of specific strings. Specifically, I'd like to be able to parse "[inf, nan]" as [float('inf'), float('nan')]. I'll also note that I would like "['inf', 'nan']" to continue to parse as ['inf', 'nan'], so it's just the unquoted variant that I'd like to intercept and change the current behavior.

在PyYaml或ruamel.yaml中,我想知道是否有一种方法可以处理特定字符串的解析。具体地说,我希望能够将“[inf,nan]”解析为[浮点(‘inf’),浮点(‘nan’)]。我还会注意到,我希望“[‘inf’,‘nan’]”继续解析为[‘inf’,‘nan’],因此它只是我想截取并更改当前行为的未加引号的变体。


I'm aware that currently I could use "[.inf, .nan]" or "[!!float inf, !!float nan]", but I'm curious if I could extend the Loader to allow for the syntax that I expected would have worked (but doesn't).

我知道目前我可以使用“[.inf,.nan]”或“[!!Float Inf,!!Float NaN]”,但我很好奇是否可以扩展Loader以支持我期望的语法(但没有)。


Perhaps I'm just making a footgun by allowing "nan" and "inf" to be parsed as floats rather than strings - and I'm interested in hearing compelling reasons that I should not allow for this type of parsing. But I'm not too woried about the case where other parses would parse my configs incorrectly (but maybe I'm underestimating the pain that will cause in the future). I plan to use this as a one way convineince in parsing arguments on the command line, and I don't expect actual config files to be written like this.

也许我允许将“nan”和“inf”解析为浮点数而不是字符串,这只是在制造麻烦--我很感兴趣地听到一些令人信服的理由,我不应该允许这种类型的解析。但我并不太担心其他解析器会错误地解析我的配置的情况(但我可能低估了将来会造成的痛苦)。我计划将其用作在命令行上解析参数的一种方便方法,我并不期望实际的配置文件是这样编写的。


In any case I'd still be interested in how it could be done, even if the conclusion is that it shouldn't be done.

无论如何,我仍然对如何做这件事感兴趣,即使结论是它不应该做。


更多回答
优秀答案推荐

Based on the confusion that I have seen caused by Yes, On, No and Off being
interpreted as boolean values in YAML 1.1, I don't think this is a good idea.

基于我所看到的由YAML1.1中的Yes、On、No和Off解释为布尔值造成的混乱,我认为这不是一个好主意。


But it is possible to do this both in ruamel.yaml and PyYAML, by changing the regex
that recognises floats (i.e. that assigns the implicit tag tag:yaml.org,2002:float to the scalar)
and then to make sure the routine constructing a float from a scalar handles these additional
scalars. The three main improvements (with regard to this) in ruamel.yaml are that
it has different regexes for YAML 1.1 and YAML 1.2 parsing (the latter being the default,
the former having to be specified either by a directive, or by setting .version on the YAML() instance);
that the various Resolvers each have a copy of these regexes instead of sharing
one (as in PyYAML, which makes having multiple, differently behaving parsers in one program difficult);
and that regex compilation is delayed until they are actually needed.

但在ruamel.yaml和PyYAML中都可以做到这一点,方法是更改识别浮点数的正则表达式(即将隐式标记:yaml.org,2002:Float分配给标量),然后确保从标量构造浮点数的例程处理这些额外的标量。在这方面,ruamel.yaml的三个主要改进是,它对YAML 1.1和YAML 1.2的解析有不同的正则表达式(后者是默认的,前者必须通过指令或通过在YAML()实例上设置.Version来指定);不同的解析器每个都有这些正则表达式的副本,而不是共享一个(就像在PyYAML中一样,这使得在一个程序中有多个行为不同的解析器变得困难);正则表达式编译被延迟,直到真正需要它们。


Given the differences, the following will only apply to ruamel.yaml

考虑到这些差异,以下内容将仅适用于ruamel.yaml


You need to create a resolver, and replace its regex recognition for all floats,
and then create a constructor that constructs the floats based on the
recognised scalars:

您需要创建一个解析器,并替换其对所有浮点数的正则表达式识别,然后创建一个基于识别的标量构造浮点数的构造函数:


import re, sys
import ruamel.yaml

class NanInfResolver(ruamel.yaml.resolver.VersionedResolver):
pass

# difference with the regex in resolver.py is the ? after \\.
# as well as recognising N and I as starting chars
# no delayed compile of the regex here
NanInfResolver.add_implicit_resolver(
'tag:yaml.org,2002:float',
re.compile('''^(?:
[-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
|[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
|[-+]?\\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?\\.?(?:inf|Inf|INF)
|\\.?(?:nan|NaN|NAN))$''', re.X),
list('-+0123456789.niNI')
)

class NanInfConstructor(ruamel.yaml.constructor.RoundTripConstructor):
def construct_yaml_float(self, node):
value = self.construct_scalar(node).lower()
sign = +1
if value[0] == '-':
sign = -1
if value[0] in '+-':
value_s = value_s[1:]
if value == 'inf':
return sign * self.inf_value
if value == 'nan':
return self.nan_value
return super().construct_yaml_float(node)

NanInfConstructor.add_constructor(
'tag:yaml.org,2002:float', NanInfConstructor.construct_yaml_float
)



yaml_str = """\
[nano, 1.0, .NaN, inf, nan] # some extra values to test
"""

yaml = ruamel.yaml.YAML()
yaml.Resolver = NanInfResolver
yaml.Constructor = NanInfConstructor

data = yaml.load(yaml_str)
for x in data:
print(type(x), x)
print()
yaml.dump(data, sys.stdout)

which gives:

这提供了:


<class 'str'> nano
<class 'ruamel.yaml.scalarfloat.ScalarFloat'> 1.0
<class 'float'> nan
<class 'float'> inf
<class 'float'> nan

[nano, 1.0, .nan, .inf, .nan] # some extra values to test

That 1.0 is loaded as a ScalarFloat is necessary to preserve its formatting when
dumping. It is possible to preserve the different ways of writing .nan, .inf, nan and inf in a similar way, but you would
have to make a special representer and either extend ScalarFloat or make one
or more explicit types that keep the the original scalar string value. Either way you
would lose the possibility to test with x is float('nan') which may be a problem
in real programs (which is also the
reason why ruamel.yaml doesn't preserve the different forms of null during round-trip).

1.0作为ScalarFloat加载,这是在转储时保留其格式所必需的。可以以类似的方式保留写入.nan、.inf、NaN和inf的不同方式,但您必须创建一个特殊的表示者,并扩展ScalarFloat或创建一个或多个保留原始标量字符串值的显式类型。无论哪种方式,您都将失去使用x is Float(‘nan’)进行测试的可能性,这在实际程序中可能是一个问题(这也是ruamel.yaml在往返过程中不保留不同形式的NULL的原因)。


更多回答

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com