gpt4 book ai didi

sql-server - SQL 解析出多个子字符串

转载 作者:行者123 更新时间:2023-12-03 23:51:07 25 4
gpt4 key购买 nike

我有一个非常长且复杂的字符串,它带有新的换行符 - 我很难解析。我需要能够为以下每个字段创建一个带有列的选择查询。

理想的情况是找到 new line break - 对于每一行 - 回到 : 冒号之前的所有内容都应该是列的名称,而 :new ling break 之间的所有内容都应该是字段中的数据。

所有数据都作为字符串返回,所以我只是为以下每一行构建一个选择语句。我不确定这是否可能。

第二种选择,硬编码并说出类似 CHARINDEX ( 'Home Phone:' ,notes, 0) 的内容,我在其中找到家庭电话字符串,然后在指定字符串后提取 :new ling break 之间的所有内容。

在这种情况下,我的查询中的每个选择项都会说 - 查找字符串“家庭电话”并提取冒号后面的内容,或查找字符串“学校名称”等。

这是数据的样子(在一个名为 notes 的全字符串中):

Home Phone: 1234567890  
Cell Phone: 1234567890
Date of Birth: 01/01/1971
School Name: James Jones High School
Address:123 Main Street
School City: Queens
School State: PA
School Zip: 32112
Years Teaching: 12
Grade Levels: Middle School
Total Students: 120
Subject: Music:
How did they hear: Other, provide more info: Former partner teacher in the Middle School
Type: Public/Charter
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc

所以输出看起来像这样(在每个领域都回答了所有长问题)。
Home Phone  Cell Phone  Date of Birth:  …   Type:               Question 1 :                Question 2:    Question 3: 
1234567890 1234567890 1/1/1971 Public/Charter aaaaaaaa aaaaaaaaaaaaa. bbb bbbbbbbbbb ccccccccccccccccccccccc

我不确定这是否有意义 - 但任何和所有建议都非常感谢。

提取子字符串和新行字符的代码——但这是硬编码的。我无法弄清楚如何动态地做到这一点。
SELECT  ltrim(rtrim(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: '))) as 'beggining',
ltrim(rtrim(CHARINDEX ( CHAR(10) ,notes, 0))) as 'ending',
SUBSTRING(notes,(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: ')),(LEN('Home Phone: '))) as 'home phone',
FROM table a

谢谢!

最佳答案

大部分功劳 (90%) 应该归功于 Alex K,他提供了有关查找第 n 次出现的字符的深入答案

SQL Server - find nth occurrence in a string

我接受了那个答案,针对您的问题进行了调整,然后应用 PIVOT 将其分解为所需的行/列。这种方法应该能够为您需要的尽可能多的独特问题集创建所需的输出,前提是它们始终具有相同的逻辑(每个问题/答案由换行符分隔)。

--Creates temporary table for testing, ID column and second set of data
--used to ensure query works for each unique set of questions
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
DROP TABLE #Results

CREATE TABLE #Results
(ID INT IDENTITY(1,1) NOT NULL,
Notes NVARCHAR(4000) NOT NULL)
INSERT INTO #Results
(Notes)
VALUES
('Home Phone: 1234567890
Cell Phone: 1234567890
Date of Birth: 01/01/1971
School Name: James Jones High School
Address:123 Main Street
School City: Queens
School State: PA
School Zip: 32112
Years Teaching: 12
Grade Levels: Middle School
Total Students: 120
Subject: Music:
How did they hear: Other, provide more info: Former partner teacher in the Middle School
Type: Public/Charter '),
('Home Phone: test
Cell Phone: test
Date of Birth: test
School Name: test
Address:test
School City: test
School State: test
School Zip: test
Years Teaching: test
Grade Levels: test
Total Students: test
Subject: test
How did they hear: test
Type: test ');

--Recursive CTE to determine the position of each successive line break
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns
WITH cte
AS

(SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos
FROM #Results
UNION ALL
SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos
FROM cte
WHERE
pos >0),

--2nd CTE breaks each question set into it's own row
cte2
AS
(SELECT ID, Notes,Starts, Pos,
SUBSTRING(Notes, Starts,
CASE
WHEN pos > 0 THEN (pos - starts)
ELSE LEN(notes)
END) AS Token
FROM cte),

--3rd CTE cleans up the data, separating the Questions/Answers into separate columns
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used
--double REPLACE and removed CHAR(9) (tab)
--LTRIM removes leading space
cte3
AS
(SELECT ID,
LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question,
LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer
FROM cte2)

--Pivot separates each Question/Answer row into it's own column
SELECT *
FROM
(SELECT ID, Question, Answer
FROM cte3) AS a
PIVOT
(MAX(Answer)
FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear],
[School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt

我在每个部分都发表了评论,希望能解释我的逻辑,但如果您有任何问题,请告诉我。

编辑:动态枢轴

可以使用动态 SQL 创建一个 PIVOT,它会自动选取所有“问题”列并进行相应调整。我不相信它可以一步完成,因为我必须使用多个 CTE。我要做的是采用上述步骤来创建 CTE、CTE2 和 CTE3(基本上是 PIVOT 查询之前的所有内容)并创建这些步骤的 View ,然后使用该 View 执行以下操作(对于我的示例,该 View 称为“问卷")
DECLARE @columns AS NVARCHAR(MAX)
DECLARE @query AS NVARCHAR(MAX)

SET @columns = STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question)
FROM questionaire AS q
FOR XML PATH(''), TYPE
).value('.','NVARCHAR(MAX)')
,1,1,'')

SET @query = 'SELECT ID, '+ @columns +' FROM
(
SELECT ID, Answer, Question
FROM questionaire
) AS a
PIVOT
(
MAX(Answer)
FOR Question IN(' +@columns+')
) AS p'
EXECUTE(@query)

关于sql-server - SQL 解析出多个子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35707770/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com