gpt4 book ai didi

ruby-on-rails - 不在映射中的字段包含在 ElasticSearch 返回的搜索结果中

转载 作者:行者123 更新时间:2023-12-02 22:34:05 25 4
gpt4 key购买 nike

我想使用 Tire gem 作为 ElasticSearch 的客户端来索引 pdf 附件。在我的映射中,我从 _source 中排除了附件字段,这样附件就不会存储在索引中并且不会在搜索结果中返回:

mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment'
end

当我运行以下 curl 命令时,我仍然可以看到包含在搜索结果中的附件内容:

curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{
"query": {
"query_string": {
"query": "rspec"
}
}
}'

我已经在 thread 中发布了我的问题:

但我刚刚注意到,不仅附件包含在搜索结果中,而且所有其他字段(包括未映射的字段)也包含在内,如您在此处所见:

{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.025427073,
"hits": [
{
"_index": "user_files",
"_type": "user_file",
"_id": "5",
"_score": 0.025427073,
"_source": {
"user_file": {
"id": 5,
"folder_id": 1,
"updated_at": "2012-08-16T11:32:41Z",
"attachment_file_size": 179895,
"attachment_updated_at": "2012-08-16T11:32:41Z",
"attachment_file_name": "hw4.pdf",
"attachment_content_type": "application/pdf",
"created_at": "2012-08-16T11:32:41Z",
"attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
}
}
}
]
}
}

attachment_file_sizeattachment_content_type 未在映射中定义,但在搜索结果中返回:

{
"id": 5,
"folder_id": 1,
"updated_at": "2012-08-16T11:32:41Z",
"attachment_file_size": 179895, <---------------------
"attachment_updated_at": "2012-08-16T11:32:41Z",
"attachment_file_name": "hw4.pdf", <------------------
"attachment_content_type": "application/pdf",
"created_at": "2012-08-16T11:32:41Z",
"attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
}

这是我的完整实现:​​

  include Tire::Model::Search
include Tire::Model::Callbacks

def self.search(folder, params)
tire.search() do
query { string params[:query], default_operator: "AND"} if params[:query].present?
#filter :term, folder_id: folder.id
#highlight :attachment_original, :options => {:tag => "<em>"}
raise to_curl
end
end

mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment'
end

def to_indexed_json
to_json(:methods => [:attachment_original])
end

def attachment_original
if attachment_file_name.present?
path_to_original = attachment.path
Base64.encode64(open(path_to_original) { |f| f.read })
end
end

谁能帮我弄清楚为什么所有字段都包含在 _source 中?

编辑:这是运行 localhost:9200/user_files/_mapping

的输出
{
"user_files": {
"user_file": {
"_source": {
"excludes": [
"attachment_original"
]
},
"properties": {
"attachment_content_type": {
"type": "string"
},
"attachment_file_name": {
"type": "string"
},
"attachment_file_size": {
"type": "long"
},
"attachment_original": {
"type": "attachment",
"path": "full",
"fields": {
"attachment_original": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
}
}
},
"attachment_updated_at": {
"type": "date",
"format": "dateOptionalTime"
},
"created_at": {
"type": "date",
"format": "dateOptionalTime"
},
"folder_id": {
"type": "integer"
},
"id": {
"type": "integer"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}

如您所见,出于某种原因,所有字段都包含在映射中!

最佳答案

在您的 to_indexed_json 中,您包含了 attachment_original 方法,因此它被发送到 elasticsearch。这也是为什么所有其他属性都包含在映射中并因此包含在源中的原因。

参见 ElasticSearch & Tire: Using Mapping and to_indexed_json有关该主题的更多信息的问题。

似乎 Tire 确实将正确的映射 JSON 发送到 elasticsearch——我的建议是使用 Tire.configure { logger STDERR, level: "debug"} 来检查正在发生的事情和 trz在原始级别查明问题。

关于ruby-on-rails - 不在映射中的字段包含在 ElasticSearch 返回的搜索结果中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12002069/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com