gpt4 book ai didi

xml - 在 Snowflake 中使用递归层次结构解析 XML

转载 作者:行者123 更新时间:2023-12-05 04:50:28 25 4
gpt4 key购买 nike

我正在尝试解析以下具有递归层次结构的 XML。我只能循环一次,第二次调查数据永远不会被填充。另外,我在列中得到 NULL 值

<DATA_EXPORT>
<SURVEYDATA>
<SURVEY_ID>1</SURVEY_ID>
<CLIENT_ID>ABC</CLIENT_ID>
<COMMENTS>
<RESPONSE>
<QUESTION>Do you drink?</QUESTION>
<ANSWER>Yes</ANSWER>
</RESPONSE>
</COMMENTS>
<COMMENTS>
<RESPONSE>
<QUESTION>Do you Smoke?</QUESTION>
<ANSWER>Yes</ANSWER>
</RESPONSE>
</COMMENTS>
</SURVEYDATA>
<SURVEYDATA>
<SURVEY_ID>2</SURVEY_ID>
<CLIENT_ID>DEF</CLIENT_ID>
<COMMENTS>
<RESPONSE>
<QUESTION>Do you drink?</QUESTION>
<ANSWER>No</ANSWER>
</RESPONSE>
</COMMENTS>
</SURVEYDATA>
</DATA_EXPORT>

使用的查询:

SELECT 
GET(XMLGET(XMLGET(TEST_XML_1, 'SURVEYDATA'),'SURVEY_ID'), '$') AS SURVEY_ID,
GET(XMLGET(D.VALUE, 'QUESTION'), '$') AS QUESTION,
GET(XMLGET(D.VALUE, 'ANSWER'), '$') AS ANSWER
FROM DATA,
LATERAL FLATTEN (GET(XMLGET(TEST_XML_1, 'SURVEYDATA', 0), '$'))D;

我得到的输出是:

<表类="s-表"><头>SURVEY_ID问题回答<正文>1空空1空空1空空1空空

我期望的输出是:

<表类="s-表"><头>SURVEY_ID问题回答<正文>1你喝酒吗?是1你抽烟吗?是2你喝酒吗?没有

最佳答案

看来你现在想遍历 DATA_EXPORT 中的对象你需要得到那个对象,GET(xml, '$') 会给你, 因此下面会给你两行 SURVEYDATA

SELECT q.*
FROM TEST_XML,
LATERAL FLATTEN(GET(src_xml, '$')) q;

鉴于您需要 survey_id 和 cleint_id,现在让我们将它们和嵌套的评论拉出来,这样我们就可以看到我们正在获取我们想要的数据:

SELECT 
get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
,XMLGET(q.value, 'COMMENTS') as comments
FROM TEST_XML,
LATERAL FLATTEN(GET(src_xml, '$')) q;

但我们注意到这只有一个评论,因此需要循环的不是评论,而是实际上是 SURVEYDATA 的对象,但只保留评论:

SELECT 
get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
,XMLGET(q.value, 'COMMENTS') as comments
,get(q.value, '$')
,c.*
FROM TEST_XML,
LATERAL FLATTEN(GET(src_xml, '$')) q,
LATERAL FLATTEN(get(q.value, '$')) c
WHERE get(c.value, '@')='COMMENTS'

现在我们可以解压我们想要的评论值:

SELECT 
get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
,c.value
,XMLGET(c.value, 'RESPONSE') as resp
,get(XMLGET(resp, 'QUESTION'), '$') as question
,get(XMLGET(resp, 'ANSWER'), '$' ) as answer
FROM TEST_XML,
LATERAL FLATTEN(GET(src_xml, '$')) q,
LATERAL FLATTEN(get(q.value, '$')) c
WHERE get(c.value, '@')='COMMENTS'

现在我们可以看到我们拥有了我们想要的所有值,我们可以稍微压缩 SQL,这样它就没有我们用来帮助​​我们解决问题的中间值。

给出最终的 SQL,在 CTE 中包含数据以帮助测试:

with TEST_XML as (
select parse_xml('<DATA_EXPORT>
<SURVEYDATA>
<SURVEY_ID>1</SURVEY_ID>
<CLIENT_ID>ABC</CLIENT_ID>
<COMMENTS>
<RESPONSE>
<QUESTION>Do you drink?</QUESTION>
<ANSWER>Yes</ANSWER>
</RESPONSE>
</COMMENTS>
<COMMENTS>
<RESPONSE>
<QUESTION>Do you Smoke?</QUESTION>
<ANSWER>Yes</ANSWER>
</RESPONSE>
</COMMENTS>
</SURVEYDATA>
<SURVEYDATA>
<SURVEY_ID>2</SURVEY_ID>
<CLIENT_ID>DEF</CLIENT_ID>
<COMMENTS>
<RESPONSE>
<QUESTION>Do you drink?</QUESTION>
<ANSWER>No</ANSWER>
</RESPONSE>
</COMMENTS>
</SURVEYDATA>
</DATA_EXPORT>') as SRC_XML
)
SELECT
get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
,get(XMLGET(XMLGET(c.value, 'RESPONSE'), 'QUESTION'), '$') as question
,get(XMLGET(XMLGET(c.value, 'RESPONSE'), 'ANSWER'), '$' ) as answer
FROM TEST_XML,
LATERAL FLATTEN(GET(src_xml, '$')) q,
LATERAL FLATTEN(get(q.value, '$')) c
WHERE get(c.value, '@')='COMMENTS'

给出结果:

SURVEY_ID   CLIENT_ID   QUESTION        ANSWER
1 "ABC" "Do you drink?" "Yes"
1 "ABC" "Do you Smoke?" "Yes"
2 "DEF" "Do you drink?" "No"

关于xml - 在 Snowflake 中使用递归层次结构解析 XML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67255099/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com