gpt4 book ai didi

java - 如何使用 Hibernate Lucene 搜索对挪威语字符(Æ、Ø 和 Å)进行不区分大小写的排序?

转载 作者:行者123 更新时间:2023-11-30 07:10:52 25 4
gpt4 key购买 nike

æ、ø、å 是挪威语字母表中最新的字母

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å

当我们尝试使用 Hibernate Lucene 对其进行排序时,Å 与 A 组合Ø 组合与 ØÆ clibs 与 A 这是错误的。例如:

当前结果:

Aaalu, Åaalu, Baalu, Zaalu,

预期结果:

Aaalu, Baalu, Zaalu, Åaalu,

以下是工作代码:

@AnalyzerDef(name = "myOwnAnalyzer",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
@Parameter(name = "replacement", value = " "),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
@Parameter(name = "replacement", value = ""),
@Parameter(name = "replace", value = "all")
}),
@TokenFilterDef(factory = TrimFilterFactory.class)
}
)
public class KikaPaya implements Serializable {

@Fields({ @Field(index = Index.YES, store = Store.YES), @Field(name = "KikaPayaName_for_sort", index = Index.YES, analyzer = @Analyzer(definition = "myOwnAnalyzer")) })
@Column(name = "NAME", length = 100)
private String name;

主要:

  FullTextEntityManager ftem = Search.getFullTextEntityManager(factory.createEntityManager());
QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity( KikaPaya.class ).get();
org.apache.lucene.search.Query query = qb.all().getQuery();
FullTextQuery fullTextQuery = ftem.createFullTextQuery(query, KikaPaya.class);
fullTextQuery.setSort(new Sort(new SortField("KikaPayaName_for_sort", SortField.STRING, true)));
fullTextQuery.setFirstResult(0).setMaxResults(150);
int size = fullTextQuery.getResultSize();
List<KikaPaya> result = fullTextQuery.getResultList();
for (KikaPayauser : result) {
logger.info("KikaPaya Name:" + user.getName());
}

以下是 Lucene 的版本(我无法更改):

 <hibernate.version>4.2.8.Final</hibernate.version>
<hibernate.search.version>4.3.0.Final</hibernate.search.version>

<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>4.2.8.Final</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>3.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers</artifactId>
<version>3.6.2</version>
</dependency>

有人能建议获得正确结果的方法吗?

最佳答案

我必须承认这并不常见。据我所知,有一个 Lucene 模块使用 ICU 进行区域设置相关排序。

请参阅 lucene-icu 工件,尤其是 ICUCollat​​ionKeyFilter 和 ICUCollat​​ionKeyAnalyzer(分析器是带有过滤器的 KeywordTokenizer)。您将需要创建将其与 Hibernate Search 一起使用所需的工厂,但这应该非常简单。

不能真正保证它会起作用,但这可能是您最好的选择。

关于java - 如何使用 Hibernate Lucene 搜索对挪威语字符(Æ、Ø 和 Å)进行不区分大小写的排序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39264308/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com