gpt4 book ai didi

python - 在 Python 中根据函数名拆分 SQL 语句但保留分隔符

转载 作者:行者123 更新时间:2023-12-01 01:08:58 25 4
gpt4 key购买 nike

假设我有以下字符串,其中包含从SELECT子句中提取的SQL语句(实际上,这是一个包含数百条此类语句的巨大SQL语句);

  SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2

如何在函数上拆分此查询(即 SUMMAXMINMEAN 等。 )这样我就可以提取最后一个查询,但不删除分隔符(在本例中为 SUM)?

因此所需的输出将是如下所示的字符串:

  SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2

PS:出于演示目的,我提供了某种缩进,但实际上这些语句是用逗号分隔的,这意味着原始形式中不会出现空格或换行符。

最佳答案

这里不能使用正则表达式,因为 SQL 语法不形成可以与 Python re 引擎匹配的正则模式。您必须将字符串实际解析为 token 流或语法树;毕竟,您的 SUM(...) 可以包含多种语法,包括子选择。

sqlparse library可以做到这一点,即使它是 bit underdocumented and not that friendly to external uses .

重新使用我在链接到的另一篇文章中定义的 walk_tokens 函数:

from collections import deque
from sqlparse.sql import TokenList

def walk_tokens(token):
queue = deque([token])
while queue:
token = queue.popleft()
if isinstance(token, TokenList):
queue.extend(token)
yield token

SELECT 标识符列表中提取最后一个元素是:

import sqlparse
from sqlparse.sql import IdentifierList

tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
if isinstance(tok, IdentifierList):
# iterate to leave the last assigned to `identifier`
for identifier in tok.get_identifiers():
pass
break

print(identifier)

演示:

>>> sql = '''\
... SUM(case when(A.money-B.money>1000
... and A.unixtime-B.unixtime<=890769
... and B.col10 = "A"
... and B.col11 = "12"
... and B.col12 = "V") then 10
... end) as finalCond0,
... MAX(case when(A.money-B.money<0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "4321"
... and B.cond3 in ("E", "F", "G")) then A.col10
... end) as finalCond1,
... SUM(case when(A.money-B.money>0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "1234"
... and B.cond3 in ("A", "B", "C")) then 2
... end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
... if isinstance(tok, IdentifierList):
... # iterate to leave the last assigned to `identifier`
... for identifier in tok.get_identifiers():
... pass
... break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2

identifier 是一个 sqlparse.sql.Identifier 实例,但再次将其转换为字符串(print() 会这样做,或者您只需使用 str()) 即可再次为您提供该部分的输入 SQL 字符串。

关于python - 在 Python 中根据函数名拆分 SQL 语句但保留分隔符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55042234/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com