gpt4 book ai didi

php - sphinx 搜索 : charset table difficulties

转载 作者:行者123 更新时间:2023-12-02 06:21:51 24 4
gpt4 key购买 nike

这两天我都在想这件事......

我想在 sphinx 搜索中使用斯洛文尼亚字母,所有英文字母 + č ž š(以防万一 ć)

我在网上到处寻找合适的字符,但我发现 squat...

所以我决定一步步自己做...

这是我的索引

index classifieds
{
source = classifieds_src
path = c:\Sphinx\data\classifieds
docinfo = extern

min_infix_len = 2
infix_fields = title,keywords,summary,text
expand_keywords = 1
enable_star = 1


charset_type = utf-8
charset_table = 0..9, a..z, _, A..Z->a..z,-, U+002C, \
U+010C->U+010D, U+0106->U+0107, U+0160->U+0161, U+017D->U+017E, \
U+010D->c,U+0107->c, U+0161->s, U+017E->z, \
U+010D, U+0107, U+0161, U+017E
}

我将大 Č、Ć Š Ž 映射到对应的小写字母,并添加了来自č 变成 c, ć 变成 c, š 变成 s, ž 变成 z最后我将这四个字符添加到表中....

这些是我的分类标题:

t1: HP USB optična miška za prenosnik RH304t2: Čiška PCplus MO-U033+F2 (optična, brezžična, PS/2)t3:Miška Logitech optična Nano M235 siva

数据库编码:utf8_general_ci表的编码:utf8_general_ci标题字段编码:utf8_general_ci

测试用例:

$testcase = array(
"miška",
"mi*ka",
"Čiška",
"čiška",
"miska",
"usb prenosnik",
"prenosnik miska",
"miška usb"
);

//api settings:

$this->sphinx->SetArrayResult(true);
$this->sphinx->setLimits(0, 100);
$this->sphinx->setMatchMode(SPH_MATCH_EXTENDED2);
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC');
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetFieldWeights(array("title"=>100, "keywords"=>80, "summary"=>60,
"text"=>20, "slug"=>10));

最后是测试结果:

关键字(总计/total_found)单词

miška     (0/0)

Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)

[miška] => Array
(
[docs] => 0
[hits] => 0
)

)

mi*ka (0/0)

Array
(
[*mi*] => Array
(
[docs] => 3
[hits] => 4
)

[mi] => Array
(
[docs] => 1
[hits] => 1
)

[*2aka*] => Array
(
[docs] => 0
[hits] => 0
)

[2aka] => Array
(
[docs] => 0
[hits] => 0
)

)

Čiška (0/0)

Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)

[čiška] => Array
(
[docs] => 0
[hits] => 0
)

)

čiška (0/0)

Array
(
[*čiška*] => Array
(
[docs] => 0
[hits] => 0
)

[čiška] => Array
(
[docs] => 0
[hits] => 0
)

)

miska (0/0)

Array
(
[*miska*] => Array
(
[docs] => 0
[hits] => 0
)

[miska] => Array
(
[docs] => 0
[hits] => 0
)

)

usb prenosnik (1/1)

Array
(
[*usb*] => Array
(
[docs] => 1
[hits] => 1
)

[usb] => Array
(
[docs] => 1
[hits] => 1
)

[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)

[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)

)

prenosnik miska (0/0)

Array
(
[*prenosnik*] => Array
(
[docs] => 1
[hits] => 1
)

[prenosnik] => Array
(
[docs] => 1
[hits] => 1
)

[*miska*] => Array
(
[docs] => 0
[hits] => 0
)

[miska] => Array
(
[docs] => 0
[hits] => 0
)

)

miška usb (0/0)

Array
(
[*miška*] => Array
(
[docs] => 0
[hits] => 0
)

[miška] => Array
(
[docs] => 0
[hits] => 0
)

[*usb*] => Array
(
[docs] => 1
[hits] => 1
)

[usb] => Array
(
[docs] => 1
[hits] => 1
)

)

你可以清楚地看到我只在没有斯洛文尼亚特殊字符的查询中得到肯定的结果

拜托,请帮帮我,我对此失去了理智

最佳答案

问题是 sphinx 索引器默认不使用 utf8 字符集。通过将以下内容添加到 sphinx.conf 进行修复

sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8

引用资料

关于php - sphinx 搜索 : charset table difficulties,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7637751/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com