gpt4 book ai didi

SQL Server,将具有多行html代码的文本解析为表格

转载 作者:行者123 更新时间:2023-12-04 14:04:13 26 4
gpt4 key购买 nike

我有一个代码的框架,我开始为以下目的编写代码:

use EAS
go
declare @entityID,int
declare @text,text
declare @col1,varchar --col1 is between <b>col1</b>
declare @col2,varchar --col2 are between >col2</td>

@entityID=entity_ID
@text = maAttributes_Text.value
@col1 = select SUBSTRING(@text,CHARINDEX('<b>',@text)+len('<b>'),
charindex('</b>',@text))
@col2 = select substring(@text, CHARINDEX('>',@text)+LEN('>'),charindex('</td>',@text))

begin
add to `table2` [entity_id], [caption], [value]
@entityID, @col1, @col2
end

我需要访问另一个表,table1,其中包含字段 entity_idcaptionvalue,以及其他一些不重要的。

entity_id | caption      | value
----------|--------------|-------
12345 |Specifications| *html snippet*

我的代码应该做的是,在 table1 中找到在字段 caption 中具有 'Specifications' 值的行,然后提取 entity_id 值,存储它,然后运行一个循环,该循环获取 html 表的所有行并将信息添加到另一个 sql 表中。

例如,运行table1的一行代码后,table2将如下所示:

table2
entity_ID|Caption|Value
---------|-------|-----
12345 |length |10
12345 |width |10
12345 |height |10
et al.

table2 中的所有信息均来自存储在 table1value 字段中的 html 片段。

我的直觉告诉我应该同时使用循环(我这辈子都不能在 sql 中创建循环)和 STRING_SPLIT 函数。但是,我不确定如何实现它们。

完成table1的第一行后,会继续下一行,下一行,行数WHERE caption in ('specifications')

编辑:我知道我的措辞很困惑。我会尝试重新解释。

我有 table1,其中包含一个与 html 片段相关联的 ID 号。 html 片段是包含许多行和两列的表格的代码。我想要的函数将获取 html 片段并逐行解析,并将其添加到 table2,其中来自 table1 的一个 ID 号将附加到 html 的几行 table 。因此,当提取 HTML 片段的前几行时,table2 将显示为:

table2
entity_ID|Caption|Value
---------|-------|-----
12345 |length |10
12345 |width |10
12345 |height |10

其中 entity_ID 是唯一的 ID 号,它与 HTML 片段中的几行相关联。 Caption 取自 html 片段的 col1Value 取自 html 片段的 col2

编辑 2:
我的情况的一个简单示例:

表1

ID|Caption       |Value  
--|--------------|-----
1 |Specifications|*html snippet*
2 |Specifications|*html snippet2*
etc.

代码将从 table1 中获取信息并将其放入“table2”中,如下所示:

表2

ID|SpecName|Spec    
--|--------|----
1 |Length |10
1 |Width |10
1 |Height |10

SpecName 和 Spec 是 html 代码段创建的 html 表中的字段。长度、宽度、高度是代码段生成的信息。 10,是每个 SpecNames 的值。

<table  class="inline_sd_table"> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Accessories Optional</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Accommodates Ceiling Height</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ADA Compliance</b></td>      <td class="inline_sd_even_cell">See Additional Information above</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Blower Air Mover Type</b></td>      <td class="inline_sd_odd_cell">Fan</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Blower Included</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Boost Mode</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>CFM (Airflow)</b></td>      <td class="inline_sd_even_cell">190</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Color/Finish</b></td>      <td class="inline_sd_odd_cell">Almond</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Control Feature Filter Reminder</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Damper Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Delay Shut Off</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Digital Clock</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Dishwasher Safe Filters</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Duct Connector Size</b></td>      <td class="inline_sd_odd_cell">3-1/4" x 10", 7" Round Non-Ducted</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ENERGY STAR&reg; qualified</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Filter Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Filter Type</b></td>      <td class="inline_sd_even_cell">Aluminum</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Flue Accommodates ceilings up to xx ft</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Heat Sentry - Temperature Sensor</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>HVI Certified</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Light Bulb Quantity</b></td>      <td class="inline_sd_even_cell">1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Light Bulbs Included</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Lighting Function</b></td>      <td class="inline_sd_even_cell">One-level</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Lighting Type</b></td>      <td class="inline_sd_odd_cell">Incandescent</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Max Sones/100 CFM</b></td>      <td class="inline_sd_even_cell">4.1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Minimum Height Above Cook Top</b></td>      <td class="inline_sd_odd_cell">18"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Mounting Type</b></td>      <td class="inline_sd_even_cell">Under Cabinet</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Nightlight</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Non-Stick Coated</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Depth (inches)</b></td>      <td class="inline_sd_odd_cell">17-1/2"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Product Height (inches)</b></td>      <td class="inline_sd_even_cell">6"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Width (inches)</b></td>      <td class="inline_sd_odd_cell">42"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Rated Amps</b></td>      <td class="inline_sd_even_cell">2.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Remote Control</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Horizontal Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Horizontal Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Round High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Round Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Switch Control Settings</b></td>      <td class="inline_sd_even_cell">2</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Switch Style</b></td>      <td class="inline_sd_odd_cell">Rocker</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Timer</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>UL Listed</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Voltage</b></td>      <td class="inline_sd_even_cell">120</td> </tr> </table> 

最佳答案

您的示例有点难以理解,但这有帮助吗?

我修改了一个 Parse/Split 函数来接受两个不同的参数。在这种情况下 '>''<' .

我应该补充一点,TVF 的性能非常好。结果在 0.068 秒内返回。

示例

Declare @S varchar(max) = '<table  class="inline_sd_table"> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Accessories Optional</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Accommodates Ceiling Height</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ADA Compliance</b></td>      <td class="inline_sd_even_cell">See Additional Information above</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Blower Air Mover Type</b></td>      <td class="inline_sd_odd_cell">Fan</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Blower Included</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Boost Mode</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>CFM (Airflow)</b></td>      <td class="inline_sd_even_cell">190</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Color/Finish</b></td>      <td class="inline_sd_odd_cell">Almond</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Control Feature Filter Reminder</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Damper Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Delay Shut Off</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Digital Clock</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Dishwasher Safe Filters</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Duct Connector Size</b></td>      <td class="inline_sd_odd_cell">3-1/4" x 10", 7" Round Non-Ducted</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ENERGY STAR&reg; qualified</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Filter Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Filter Type</b></td>      <td class="inline_sd_even_cell">Aluminum</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Flue Accommodates ceilings up to xx ft</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Heat Sentry - Temperature Sensor</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>HVI Certified</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Light Bulb Quantity</b></td>      <td class="inline_sd_even_cell">1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Light Bulbs Included</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Lighting Function</b></td>      <td class="inline_sd_even_cell">One-level</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Lighting Type</b></td>      <td class="inline_sd_odd_cell">Incandescent</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Max Sones/100 CFM</b></td>      <td class="inline_sd_even_cell">4.1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Minimum Height Above Cook Top</b></td>      <td class="inline_sd_odd_cell">18"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Mounting Type</b></td>      <td class="inline_sd_even_cell">Under Cabinet</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Nightlight</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Non-Stick Coated</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Depth (inches)</b></td>      <td class="inline_sd_odd_cell">17-1/2"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Product Height (inches)</b></td>      <td class="inline_sd_even_cell">6"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Width (inches)</b></td>      <td class="inline_sd_odd_cell">42"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Rated Amps</b></td>      <td class="inline_sd_even_cell">2.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Remote Control</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Horizontal Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Horizontal Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Round High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Round Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Switch Control Settings</b></td>      <td class="inline_sd_even_cell">2</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Switch Style</b></td>      <td class="inline_sd_odd_cell">Rocker</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Timer</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>UL Listed</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Voltage</b></td>      <td class="inline_sd_even_cell">120</td> </tr> </table> '

Select Item = max(case when RetCol=1 then RetVal end)
,Value = max(case when RetCol=0 then RetVal end)
From (
Select RetGrp=(Row_Number() over (Order By RetSeq)-1)/2
,RetCol=Row_Number() over (Order By RetSeq) % 2
,RetPos
,RetVal
From [dbo].[udf-Str-Extract](@S,'>','<')
Where RetVal>''
) A
Group By RetGrp
Order By RetGrp

返回

enter image description here

UDF(如果有兴趣)

CREATE FUNCTION [dbo].[udf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table
As
Return (

with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)

Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1)
From (
Select *,RetVal = Substring(@String, N, L)
From cte4
) A
Where charindex(@Delimiter2,RetVal)>1

)
/*
Max Length of String 1MM characters

Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[udf-Str-Extract] (@String,'[[',']]')
*/

EDIT - Illustration of Raw Results

如果你只是简单地运行

Select * from  [dbo].[udf-Str-Extract](@S,'>','<')

结果将是:

enter image description here

关于SQL Server,将具有多行html代码的文本解析为表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45578134/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com