gpt4 book ai didi

sql-server - 当文本包含非日语字符时,日语中的 SQL 排序顺序会中断

转载 作者:行者123 更新时间:2023-12-02 23:33:17 24 4
gpt4 key购买 nike

当文本包含非日语文本时,日语排序似乎会“中断”,即使在查询的 sort 部分之后强制执行任何可能的排序规则也是如此。

我想知道这是否是已知现象,以及解决方案是什么。

最后,我正在寻找假名类型不敏感、区分大小写的排序,而搜索应该是假名类型不敏感且大小写不敏感

这是测试用例:

我假设从下面的脚本中,我在两个查询中得到相同的结果(预期的排序顺序在第三列中)。基本上,一旦我按完整单词排序,一旦我按第一个字母手动排序,然后是第二个和第三个字母。

给定数据库排序规则SQL_Latin1_General_CP1_CI_AS

declare  @temp as table  (title nvarchar(5),  expected int,  script varchar(40) )

set nocount on
INSERT INTO @temp values(N'かか7', 4,'hiragana no accent')
INSERT INTO @temp values(N'がが6',7,'hiragana with accent')
INSERT INTO @temp values(N'いい5',1,'earlier letter hiragana no accent')
INSERT INTO @temp values(N'カカ4',3, 'katakana no accent')
INSERT INTO @temp values(N'ガガ3',6, 'katakana with accent')
INSERT INTO @temp values(N'かか2',2, 'hiragana no accent')
INSERT INTO @temp values(N'がが1', 5, 'hiragana with accent')

--BAD
select unicode(left(title,1)) 'bin', * from @temp order by title
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by left(title,1),substring(title,2,1), substring(title,3,1)

但是只有第二个版本有效,第一个版本无法正确排序:

the two result sets

这似乎与 title 字段中的数字有关,因为当我删除它们时,我确实得到了相同的顺序。

declare  @temp as table  (title nvarchar(5),  expected int,  script varchar(40) )

set nocount on
INSERT INTO @temp values(N'かか', 2,'hiragana no accent')
INSERT INTO @temp values(N'がが',3,'hiragana with accent')
INSERT INTO @temp values(N'いい',1,'earlier letter hiragana no accent')
INSERT INTO @temp values(N'カカ',2, 'katakana no accent')
INSERT INTO @temp values(N'ガガ',3, 'katakana with accent')
INSERT INTO @temp values(N'かか',2, 'hiragana no accent')
INSERT INTO @temp values(N'がが', 3, 'hiragana with accent')

--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by title
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by left(title,1),substring(title,2,1)

在此处查看结果:

correct sort order

有谁知道原因以及可能的解决方案吗?

最佳答案

暴力方法:检查 SQL Server 中所有支持的排序规则:

create table ##temp(title nvarchar(5),  expected int,  script varchar(40) );

INSERT INTO ##temp values(N'かか7', 4,'hiragana no accent');
INSERT INTO ##temp values(N'がが6',7,'hiragana with accent');
INSERT INTO ##temp values(N'いい5',1,'earlier letter hiragana no accent');
INSERT INTO ##temp values(N'カカ4',3, 'katakana no accent');
INSERT INTO ##temp values(N'ガガ3',6, 'katakana with accent');
INSERT INTO ##temp values(N'かか2',2, 'hiragana no accent');
INSERT INTO ##temp values(N'がが1', 5, 'hiragana with accent');

和脚本:

CREATE TABLE result(collation_name NVARCHAR(1000));
DECLARE @collate_name NVARCHAR(1000);
DECLARE @sql NVARCHAR(MAX);

DECLARE c CURSOR FOR
SELECT name FROM sys.fn_helpcollations() /* where name LIKE '%japan%'*/;

OPEN c;
FETCH NEXT FROM c INTO @collate_name;

WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = REPLACE(
N'with cte as (
select bin = unicode(left(title,1)),expected
,rn= row_number() over(order by title collate <collate>)
,collation = ''<collate>''
from ##temp
)
select collation
from cte
where expected = rn GROUP BY collation HAVING COUNT(*) = 7'
, '<collate>', @collate_name);
-- debug
--PRINT @sql;

INSERT INTO result(collation_name) EXEC (@sql);
FETCH NEXT FROM c INTO @collate_name;
END

SELECT * FROM result;

CLOSE c;
DEALLOCATE c;

<强> db<>fiddle demo

结果:SQL Server 2017 中没有与“预期顺序”匹配的排序规则。

关于sql-server - 当文本包含非日语字符时,日语中的 SQL 排序顺序会中断,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59250114/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com