gpt4 book ai didi

sql - Microsoft SQL Server 中的自然(人类字母数字)排序

转载 作者:行者123 更新时间:2023-12-01 17:27:23 26 4
gpt4 key购买 nike

感谢您花时间阅读所有这些内容,内容很多!感谢所有的爱好者!

如何自然排序?

即。将一组字母数字数据排序为:

Season 1, Season 2, Season 10, Season 20

而不是

Season 1, Season 10, Season 2, Season 20

我以非常实用的格式使用了一个非常实用的电视季示例作为案例。

我希望实现以下目标:

  1. 与他人分享我的工作解决方案
  2. 请求您帮助了解如何缩短(或找到更好的解决方案)我的解决方案
  3. 您能解决下面的标准 7 吗?

我花了大约 2 个小时在线研究,另外花了 3 个小时构建这个解决方案。部分引用资料来自:

在 SO 和其他网站上找到的一些解决方案仅适用于 90% 的情况。但是,如果文本中有多个数值,则大多数/全部不起作用,或者如果文本中根本没有找到数字,则会导致 SQL 错误。

我创建了这个SQLFiddle可以使用的链接(包括以下所有代码)。

这是创建语句:

create table tvseason
(
title varchar(100)
);

insert into tvseason (title)
values ('100 Season 03'), ('100 Season 1'),
('100 Season 10'), ('100 Season 2'),
('100 Season 4'), ('Show Season 1 (2008)'),
('Show Season 2 (2008)'), ('Show Season 10 (2008)'),
('Another Season 01'), ('Another Season 02'),
('Another 1st Anniversary Season 01'),
('Another 2nd Anniversary Season 01'),
('Another 10th Anniversary Season 01'),
('Some Show Another No Season Number'),
('Some Show No Season Number'),
('Show 2 Season 1'),
('Some Show With Season Number 1'),
('Some Show With Season Number 2'),
('Some Show With Season Number 10');

这是我的工作解决方案(仅无法解决下面的标准#7):

select 
title, "index", titleLeft,
convert(int, coalesce(nullif(titleRightTrim2, ''), titleRight)) titleRight
from
(select
title, "index", titleLeft, titleRight, titleRightTrim1,
case
when PATINDEX('%[^0-9]%', titleRightTrim2) = 0
then titleRightTrim2
else left(titleRightTrim2, PATINDEX('%[^0-9]%', titleRightTrim2) - 1)
end as titleRightTrim2
from
(select
title,
len(title) - PATINDEX('%[0-9] %', reverse(title)) 'index',
left(title, len(title) - PATINDEX('%[0-9] %', reverse(title))) titleLeft,
ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))) titleRight,
ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))) titleRightTrim1,
left(ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))), PATINDEX('% %', ltrim(right(title, PATINDEX('%[0-9] %', reverse(title)))))) titleRightTrim2
from
tvseason) x) y
order by
titleLeft, titleRight

要考虑的标准:

  1. 文本不包含数字
  2. 文本在开头和结尾包含数字
  3. 文本仅在开头包含数字
  4. 文本仅在末尾包含数字
  5. 文本末尾可能包含 (YYYY)
  6. 文本可以以一位数或两位数结尾(例如 1 或 01)
  7. 可选:上述任意组合,加上文本中间的数字

这是输出:

title
100 Season 1
100 Season 2
100 Season 03
100 Season 4
100 Season 10
**Case 7 here**
Another 10th Anniversary Season 01
Another 1st Anniversary Season 01
Another 2nd Anniversary Season 01
Another Season 01
Another Season 02
Show (2008) Season 1
Show (2008) Season 2
Show 2 The 75th Anniversary Season 1
Show Season 1 (2008)
Show Season 2 (2008)
Show Season 10 (2008)
Some Show Another No Season Number
Some Show No Season Number
Some Show With Season Number 1
Some Show With Season Number 2
Some Show With Season Number 10

最佳答案

我认为这可以解决问题...我只是识别从非数字到数字的变化。我还没有做过任何大规模测试,但它应该相当快。

SET QUOTED_IDENTIFIER ON;
GO
SET ANSI_NULLS ON;
GO

ALTER FUNCTION dbo.tfn_SplitForSort
/* ===================================================================
11/11/2018 JL, Created: Comments
=================================================================== */
--===== Define I/O parameters
(
@string VARCHAR(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_Tally (n) AS (
SELECT TOP (LEN(@string))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b
),
cte_split_string AS (
SELECT
col_num = ROW_NUMBER() OVER (ORDER BY t.n) + CASE WHEN LEFT(@string, 1) LIKE '[0-9]' THEN 0 ELSE 1 END,
string_part = SUBSTRING(@string, t.n, LEAD(t.n, 1, 8000) OVER (ORDER BY t.n) - t.n)
FROM
cte_Tally t
CROSS APPLY ( VALUES (SUBSTRING(@string, t.n, 2)) ) s (str2)
WHERE
t.n = 1
OR SUBSTRING(@string, t.n - 1, 2) LIKE '[0-9][^0-9]'
OR SUBSTRING(@string, t.n - 1, 2) LIKE '[^0-9][0-9]'
)

SELECT
so_01 = ISNULL(MAX(CASE WHEN ss.col_num = 1 THEN CONVERT(FLOAT, ss.string_part) END), 99999999),
so_02 = MAX(CASE WHEN ss.col_num = 2 THEN ss.string_part END),
so_03 = MAX(CASE WHEN ss.col_num = 3 THEN CONVERT(FLOAT, ss.string_part) END),
so_04 = MAX(CASE WHEN ss.col_num = 4 THEN ss.string_part END),
so_05 = MAX(CASE WHEN ss.col_num = 5 THEN CONVERT(FLOAT, ss.string_part) END),
so_06 = MAX(CASE WHEN ss.col_num = 6 THEN ss.string_part END),
so_07 = MAX(CASE WHEN ss.col_num = 7 THEN CONVERT(FLOAT, ss.string_part) END),
so_08 = MAX(CASE WHEN ss.col_num = 8 THEN ss.string_part END),
so_09 = MAX(CASE WHEN ss.col_num = 9 THEN CONVERT(FLOAT, ss.string_part) END),
so_10 = MAX(CASE WHEN ss.col_num = 10 THEN ss.string_part END)
FROM
cte_split_string ss;
GO

正在使用的函数...

SELECT 
ts.*
FROM
#tvseason ts
CROSS APPLY dbo.tfn_SplitForSort (ts.title) sfs
ORDER BY
sfs.so_01,
sfs.so_02,
sfs.so_03,
sfs.so_04,
sfs.so_05,
sfs.so_06,
sfs.so_07,
sfs.so_08,
sfs.so_09,
sfs.so_10;

结果:

id          title
----------- ------------------------------------------
2 100 Season 1
4 100 Season 2
1 100 Season 03
5 100 Season 4
3 100 Season 10
11 Another 1st Anniversary Season 01
12 Another 2nd Anniversary Season 01
13 Another 10th Anniversary Season 01
9 Another Season 01
10 Another Season 02
16 Show 2 Season 1
6 Show Season 1 (2008)
7 Show Season 2 (2008)
8 Show Season 10 (2008)
14 Some Show Another No Season Number
15 Some Show No Season Number
17 Some Show With Season Number 1
18 Some Show With Season Number 2
19 Some Show With Season Number 10

--================================================ =======================

[编辑2020-09-23]我正在回顾我的一些旧帖子,当我遇到这篇文章时,我想看看我是否可以使用单一值输出。向 ORDER BY 添加 10 列实在是太笨重了......经过一番思考,我想到将 FLOAT 转换为 BINARY 并将 BINARY 转换回 VARCHAR,我可以使用 STRING_AGG() 函数重新组装字符串。最终结果将是产生所需排序的字符串。

CREATE FUNCTION dbo.human_sort_string
/* ===================================================================
09/23/2020 JL, Created: Just a test
=================================================================== */
--===== Define I/O parameters
(
@string varchar(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), -- 10
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b), -- 100
cte_Tally (n) AS (
SELECT TOP (LEN(@string))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b -- 10,000
),
cte_Parsed AS (
SELECT
t.n,
parsed_val = SUBSTRING(@string, ISNULL(NULLIF(t.n, 1), 0) + 1, LEAD(t.n, 1, 8000) OVER (ORDER BY t.n) - ISNULL(NULLIF(t.n, 1), 0))
FROM
cte_Tally t
CROSS APPLY ( VALUES (SUBSTRING(@string, t.n, 2)) ) sv (sub_val)
WHERE
t.n = 1
OR
sv.sub_val LIKE '[0-9][^0-9]'
OR
sv.sub_val LIKE '[^0-9][0-9]'
)
SELECT
sort_string = STRING_AGG(ISNULL(CONVERT(varchar(8000), CONVERT(binary(8), TRY_CONVERT(float, p.parsed_val)), 2), p.parsed_val), '') WITHIN GROUP (ORDER BY p.n)
FROM
cte_Parsed p;
GO

现在,外部查询看起来像这样......

SELECT 
ts.id,
td.title
FROM
#tvseason ts
CROSS APPLY dbo.human_sort_string(ts.title) hss
ORDER BY
hss.sort_string;

实际结果与之前的函数相同。

关于sql - Microsoft SQL Server 中的自然(人类字母数字)排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49374610/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com