gpt4 book ai didi

sorting - 如何将每个人(具有多个地址)的最短距离带到原点并对该值进行排序

转载 作者:行者123 更新时间:2023-12-02 22:30:55 25 4
gpt4 key购买 nike

我的弹性索引中有 People 文档,每个人都有多个地址,每个地址都有一个关联的纬度/经度点。

我想通过与特定起源位置的接近程度对所有人进行地理排序,但是每个人的多个位置使这件事变得复杂。决定的是【目的:】取每人到原点的最短距离并将该数字用作排序号 .

我的人员索引示例在“伪 JSON”中粗略显示,显示了几个人员文档,每个文档都有多个地址:

person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}

person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}

... many more people docs each having multiple addresses like above ...

目前,我正在使用带有内联 groovy 脚本的弹性脚本字段 - 它使用 groovy 脚本从每个地址的原点计算米,将所有这些米距离插入每个人的数组中,并从每个人的数组中选择最小数字使它成为排序值的人。
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()", 
origin.Latitude,
origin.Longitude);

var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);

虽然这可行,但我想知道在查询时使用脚本字段是否会更昂贵或过于复杂,并且 如果有更好的方法通过不同地索引数据和/或使用聚合来实现所需的排序顺序结果,甚至可能完全取消脚本字段 .

任何想法和指导表示赞赏。我确信其他人也遇到了同样的要求(或类似的要求),并找到了不同或更好的解决方案。

我在此代码示例中使用 Nest API,但我很乐意接受 elasticsearch JSON 格式的答案,因为我可以将它们移植到 NEST API 代码中。

最佳答案

当根据距指定原点的距离进行排序时,其中被排序的字段包含值的集合(在本例中为 geo_point 类型),我们可以使用 sort_mode 指定如何从集合中收集值。 .在这种情况下,我们可以指定 sort_mode"min"使用离原点最近的位置作为排序值。这是一个例子

public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}

public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}

void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));

var client = new ElasticClient(connectionSettings);

if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);

client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);

var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};

client.IndexMany(people);

// call refresh for testing (avoid in production)
client.Refresh("people");

var towerOfLondon = new GeoLocation(51.507313, -0.074308);

client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}

这将创建以下搜索
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}

返回
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
sort 中返回的值每个命中的数组是指定的排序单位(在我们的例子中,米)到指定点(伦敦塔)和每个人的地址的最小距离。

根据 guidelines in Sorting By Distance documentation ,通常按距离得分更有意义,这可以通过使用 function_score query with a decay function 来实现.

关于sorting - 如何将每个人(具有多个地址)的最短距离带到原点并对该值进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39519630/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com