gpt4 book ai didi

elasticsearch - 在Elasticsearch中用单词替换特定字符

转载 作者:行者123 更新时间:2023-12-03 01:58:09 28 4
gpt4 key购买 nike

我的文档字段中包含大量纯文本,其中包含一些货币符号。如何将它们更改为相应的名称,例如$ to Dollar等?

最佳答案

您可以通过创建一个带有mapping char filter的自定义分析器来实现此目的,在其中您可以指定用哪个其他字符替换哪个字符:

curl -XPUT localhost:9200/my_index -d '{
"settings": {
"analysis": {
"char_filter": {
"currencies": {
"type": "mapping",
"mappings": [
"$=>USD" <--- define your currency mappings here
]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"currencies"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "string",
"analyzer": "my_analyzer"
}
}
}
}
}'

然后,如果您为类似 You owe me $ 100的句子建立索引,将生成以下标记:
curl -XGET 'localhost:9200/my_index/_analyze?analyzer=my_analyzer&pretty' -d 'You owe me $ 100'

{
"tokens" : [ {
"token" : "You",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "owe",
"start_offset" : 4,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "me",
"start_offset" : 8,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "USD",
"start_offset" : 11,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "100",
"start_offset" : 13,
"end_offset" : 16,
"type" : "<NUM>",
"position" : 5
} ]
}

如您所见, $符号已替换为字符串 USD

关于elasticsearch - 在Elasticsearch中用单词替换特定字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35124881/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com