gpt4 book ai didi

sql - TSQL:解析具有各种字符的字符串

转载 作者:行者123 更新时间:2023-12-03 03:12:06 25 4
gpt4 key购买 nike

我有一个包含文件名列的表,其中各个供应商以不同的方式命名文件。因此,文件中存在一个带有姓氏、名字、中间名的文件名,并用各种字符分隔名称。有些有逗号+空格、逗号不带空格、单词之间有空格、单词之间没有空格、一个下划线、两个下划线等。

有哪些好的方法可以将其提取到所需的结果?(这是一次性数据转换,不一定很漂亮。)

我在下面的示例代码中尝试过,使用各种子字符串/字符索引组合

文件名示例:(注意逗号、空格、无空格、下划线、双下划线)

enter image description here

期望的结果:

enter image description here

示例代码/测试数据(在临时表中)

   IF OBJECT_ID('tempdb..#dob') IS NOT NULL DROP TABLE #dob

CREATE TABLE #dob (
FILENAME VARCHAR(MAX)
,StudentID INT
,FullName VARCHAR(500)
,LastName VARCHAR(500)
,FirstName VARCHAR(500)
,MiddleName VARCHAR(500)
)

INSERT INTO #dob
( FILENAME )
VALUES
('Last, First, Middle_DOB ID.pdf')
,('Denver, John C 11_23_1980_123456.pdf')
,('Denver John_11-23-1980, 1234567.pdf')
,('Denver,John,Clifford_ 01_22_1980_123456.pdf')
,('Denver, John, 11-23-1980, 1234567.pdf')
,('Denver, John__01_22_1980_123456.pdf')

--This is what I tried.

SELECT FILENAME
,fullname
,LastName
,FirstName
,MiddleName
,SUBSTRING(FileName,1, CHARINDEX(' ', FileName, (charindex(' ', FileName, 1))+2)) AS test1
,SUBSTRING(FileName,1, CHARINDEX('_', FileName, (charindex(' ', FileName, 1))+2)) AS test2
,SUBSTRING(FileName,1, CHARINDEX(',', FileName, (charindex(', ', FileName, 1))+1)) AS test3
,SUBSTRING(FileName,1, CHARINDEX(' ', FileName, (charindex('__', FileName, 1))+2)) AS test4
,SUBSTRING(FileName,1, CHARINDEX('__', FileName, (charindex(' ', FileName, 1))+2)) AS test5


FROM #dob

最佳答案

这是一个滑坡,但如果您的真实数据接近样本,请考虑以下事项。

示例

   SELECT FILENAME
,LastName = Pos1
,FirstName = Pos2
,MiddleName = case when try_convert(int,left(Pos3,1)) is null then Pos3 else '' end
FROM #dob A
Cross Apply ( values ( replace(
replace(
replace(
replace(FileName,', ',',')
,' ,',',')
,' ',',')
,'_',',')
)
) B(CleanString)
Cross Apply [dbo].[tvf-Str-Parse-Row](CleanString,',') C

返回

FILENAME                                      LastName  FirstName   MiddleName
Last, First, Middle_DOB ID.pdf Last First Middle
Denver, John C 11_23_1980_123456.pdf Denver John C
Denver John_11-23-1980, 1234567.pdf Denver John
Denver,John,Clifford_ 01_22_1980_123456.pdf Denver John Clifford
Denver, John, 11-23-1980, 1234567.pdf Denver John
Denver, John__01_22_1980_123456.pdf Denver John

TVF(如有兴趣)

CREATE FUNCTION [dbo].[tvf-Str-Parse-Row] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From ( values (cast('<x>' + replace((Select replace(@String,@Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml))) as A(xDim)
)

关于sql - TSQL:解析具有各种字符的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57527894/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com