gpt4 book ai didi

elasticsearch - 如何构造Elasticsearch以仅过滤具有子域的URL?

转载 作者:行者123 更新时间:2023-12-03 01:48:44 25 4
gpt4 key购买 nike

我将URL作为字段存储在Elasticsearch中。但是,我只想过滤url中具有子域的文档。

例如。

我希望我的搜索结果有

http://any-subdomain.example.com

但我不希望结果有
https://www.example.com

在Elasticsearch查询中可能吗?

最佳答案

您是否尝试过query_string查询?例如,我用于Twitter数据,如下所示:

GET /twitter2/tweet/_search
{
"query": {
"query_string": {
"default_field": "entities.media.url",
"query": "https\\:\\/\\/t.co\\/* AND -https\\:\\/\\/t.co\\/6*"
}
},
"_source": ["entities.media.url"]
}

对于此搜索,我的映射是:
PUT /twitter2/tweet/_mapping
{
"properties": {
"entities": {
"properties": {
"media": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}

您可以针对您的情况使用以下查询:
GET /your-index/your-type/_search
{
"query": {
"query_string": {
"default_field": "url",
"query": "http\\:\\/\\/*.example.com AND -http\\:\\/\\/www.example.com"
}
}
}

Note : you should know that you can get your result faster if you use something to handle while indexing your data as url and host. With elastic 5.x, you can use ingest node to manipulate your data like this. I will try to create a pipeline for this but you can check the doc for more information

关于elasticsearch - 如何构造Elasticsearch以仅过滤具有子域的URL?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42045470/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com