gpt4 book ai didi

html - 在 Oracle 查询中处理 HTML 数据

转载 作者:行者123 更新时间:2023-11-28 04:49:36 25 4
gpt4 key购买 nike

我有一个包含 ID、TEXT 等列的表这里的 TEXT 是 clob 列,其中包含 HTML 格式的数据

SELECT ID,TEXT FROM TABLE WHERE ID=1000

我得到如下文本列的输出

<p>NAME:&nbsp;&nbsp;XXX<br />Company Name: &nbsp;YYYYY<br />Location:&nbsp;ZZZ, 22 Z1Z1Z1, Z2Z2Z2,Z3Z3Z3, 0000024, IND<br />Type:&nbsp;PrePA<br />
Team:&nbsp;Team1, Dues tamble <br />Date:&nbsp;January 25 &ndash; 26, 2016<br />Rating: &nbsp;Tr 2<br />Number:&nbsp;8554342</p>
<p><u>Observ: <br /></u>There were (6) major and (2) minor .<br />&nbsp;<br />MAJOR</p><ul> <li>Sample Text_1.</li> <li>Sample Text_2.</li>
<li>Sample Text_33.</li> <li>Sample Text_4.</li> <li>Sample Text_5.</li></ul><p>MINOR</p><ul> <li>Sample Text_7</li>
<li>Sample Text_8<br /> &nbsp;</li></ul><p><b> Background</b><br />&nbsp;</p>

我需要如下输出

NAME:  XXX
Company Name: YYYYY
Location: ZZZ, 22 Z1Z1Z1, Z2Z2Z2,Z3Z3Z3, 0000024, IND
Type: PrePA
Team: Team1, Dues tamble
Date: January 25 – 26, 2016
Rating: Tr 2
Number: 8554342

Observ:
There were (6) major and (2) minor .

MAJOR

Sample Text_1.
Sample Text_2.
Sample Text_33.
Sample Text_4.
Sample Text_5.
MINOR

Sample Text_7
Sample Text_8

Background

这意味着查询应该处理 HTML 数据并且应该像上面那样提供精确的数据。

我有一个查询 beolw,它处理一些 okey 但它不处理

exactly the HTML tags.
select ID,
trim(regexp_replace (trim
(regexp_replace( TEXT ,'<[^>]+/>|</[^>]+>|<p>',CHR(13)||CHR(10)))
,'<[^>]+>',''))
from TABLE where ID='1000'

实际上我需要处理数据中的所有 html 标签,而不仅仅是我展示的样本。

最佳答案

搜索“完美”查询可能会失败;浏览器(包括其他地方推荐的文本浏览器)已经花了数年时间解决所有问题和边缘情况。

如果您真的不能使用外部资源,您可以通过 dbms_xmldom 包遍历 DOM。这是一个使用匿名 block 和您的样本值的演示,它会打印到屏幕上(如果启用的话);但您可以轻松地将其调整为返回 CLOB 或任何您需要的最终结果的函数。

set serveroutput on
set define off

declare
l_element xmldom.domelement;
l_document xmldom.domdocument;
l_text clob;

procedure print_node (p_node xmldom.domnode) is
l_nodes dbms_xmldom.domnodelist;
begin
-- print out any plain text
if dbms_xmldom.getnodetype(p_node) = 3
and dbms_xmldom.getnodename(p_node) = '#text' then
dbms_output.put_line(dbms_xmldom.getnodevalue(p_node));
end if;
-- just to match your expected output, add a blank line for p tags
if dbms_xmldom.getnodetype(p_node) = 1
and dbms_xmldom.getnodename(p_node) = 'p' then
dbms_output.new_line;
end if;
-- get any child nodes
l_nodes := dbms_xmldom.getchildnodes(p_node);
-- process each node in turn, recursively
for i in 0..dbms_xmldom.getlength(l_nodes) - 1 loop
print_node(dbms_xmldom.item(l_nodes, i));
end loop;
end print_node;

begin
l_text := '<p>NAME:&nbsp;&nbsp;XXX<br />Company Name: &nbsp;YYYYY<br />Location:&nbsp;ZZZ, 22 Z1Z1Z1, Z2Z2Z2,Z3Z3Z3, 0000024, IND<br />Type:&nbsp;PrePA<br />Team:&nbsp;Team1, Dues tamble <br />Date:&nbsp;January 25 &ndash; 26, 2016<br />Rating: &nbsp;Tr 2<br />Number:&nbsp;8554342</p>
<p><u>Observ: <br /></u>There were (6) major and (2) minor .<br />&nbsp;<br />MAJOR</p><ul> <li>Sample Text_1.</li> <li>Sample Text_2.</li>
<li>Sample Text_33.</li> <li>Sample Text_4.</li> <li>Sample Text_5.</li></ul><p>MINOR</p><ul> <li>Sample Text_7</li>
<li>Sample Text_8<br /> &nbsp;</li></ul><p><b> Background</b><br />&nbsp;</p>';

-- wrap the fragment in a root node so it parses, and run through
-- utl_i18n.unescape_reference to get rid of &nbsp; etc.
l_document := dbms_xmldom.newdomdocument('<html>'
|| utl_i18n.unescape_reference(l_text) || '</html>');
-- get the root element (which is now the added html)
l_element := dbms_xmldom.getdocumentelement(l_document);
-- call the recursive procedure to process this node
print_node(dbms_xmldom.makenode(l_element));
end;
/

产生:

NAME:  XXX
Company Name:  YYYYY
Location: ZZZ, 22 Z1Z1Z1, Z2Z2Z2,Z3Z3Z3, 0000024, IND
Type: PrePA
Team: Team1, Dues tamble
Date: January 25 – 26, 2016
Rating:  Tr 2
Number: 8554342

Observ:
There were (6) major and (2) minor .
 
MAJOR
Sample Text_1.
Sample Text_2.
Sample Text_33.
Sample Text_4.
Sample Text_5.

MINOR
Sample Text_7
Sample Text_8
 

Background
 

关于html - 在 Oracle 查询中处理 HTML 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35457126/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com