gpt4 book ai didi

regex - 在转换为数字之前使用正则表达式过滤字符串

转载 作者:行者123 更新时间:2023-11-29 11:58:58 26 4
gpt4 key购买 nike

我有这个代码(已经存在,不是我的):

SELECT
a.id_original_contrato AS contrato,
( CASE WHEN d.value~'^\\d+$' THEN d.value::integer ELSE 0 END ) AS monto,
EXTRACT(YEAR FROM b.value)::integer AS anoinicio,
EXTRACT(YEAR FROM c.value)::integer AS anofin

...等(一些 JOIN 和 WHERE)

让我解释一下:d.value 来自一个值为character varying (200) 的表。该代码稍后会将 d.value(现在称为“monto”)作为 integer 插入到另一个表中。有人对该正则表达式进行编码以提取一些字符,或者在其他情况下 (ELSE),将其定义为 0。这些值仅在 integer 时有效。如果我使用像 76.44 这样的 d.value,它不会因为正则表达式而工作,它总是将它定义为 0。

好吧,我必须更改该代码,因为:

  • 我需要将新表中的 d.value 存储为 numeric,而不是 integer(在我的新表中,数据类型是 numeric 现在)
  • 但首先,我需要更正该正则表达式,因为它弄乱了我的数字,例如 76.4466,56(点或逗号)。

我不确定正则表达式在做什么。我怎样才能用更好的或新的正则表达式来满足需求?

最佳答案

\\d 中的双反斜杠建议使用 standard_conforming_strings = off 的旧版本. The manual:

Beginning in PostgreSQL 9.1, the default is on (prior releases defaulted to off).

在带有 standard_conforming_strings = on 的现代版本中,这个字符串作为正则表达式毫无意义: '^\\d+$' 。要检测由一个或多个数字组成的字符串,请使用 E'^\\d+$' (前缀为 E )或 '^\d+$' .详情:

整数字面值还允许在 Postgres 中使用可选的 前导符号来表示负数/正数,以及前导/悬空 空白
所以,这是有效 integer 的完整正则表达式文字:

CASE WHEN d.value ~ '^\s*[-+]?\d+\s*$' THEN d.value::int ELSE 0 END

The regular expression explained:

^ .. start of string
\s .. class shorthand for [[:space:]] (white space)
* .. quantifier for 0 or more times
[+-] .. character class consisting of + and -
? .. quantifier for 0 or 1 times
\d .. class shorthand for [[:digit:]] (digits)
+ .. quantifier for 1 or more times
\s* .. same as above
$ .. end of string

Consider the syntax rules for numeric string literals. One essential quote:

There cannot be any spaces or other characters embedded in the constant

That's because a numeric constant is not quoted, hence white space is not possible. Not applicable for casting strings. White space is tolerated:

Leading, trailing and right after the exponent char.

So these are all legal strings for the cast to numeric:

'^\s*[-+]?\d*\.?\d+(?:[eE]\s*[-+]?\d+)?\s*$'

The only new element are parentheses (()) to denote the contained regular expression as atom. Since we are not interested in back references, use "non-capturing": (?:...) and append a question mark (?:[eE]\s*[-+]?\d+)? to mean: the "exponential" part can be added or not, as a whole.

Assuming dot (.) as decimal separator. You might use comma instead (,) or [,\.] to allow either. But only dot is legal for the cast.

Test:

SELECT '|' || lit || '|' AS text_with_delim
, lit ~ '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' AS valid
, lit::numeric AS number
FROM unnest ('{1
, 123
, 000
, " -1 "
, +2
, 1.2
, .34
, 5e6
, " .5e -6 "
}'::text[]) lit;

结果:

<表类="s-表"><头>text_with_delim有效<日>数 <正文>|1|t1|123|t123|000|t0| -1 |t-1|+2|t2|1.2|t1.2|.34|t0.34|5e6|t5000000| .5e -6 |t0.0000005

或者您可能会用到 to_number() 转换任意给定格式的字符串。

关于regex - 在转换为数字之前使用正则表达式过滤字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42214084/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com