gpt4 book ai didi

python - 如何在序列化大量 GeoDjango 几何字段时进行性能优化?

转载 作者:太空宇宙 更新时间:2023-11-04 09:50:20 27 4
gpt4 key购买 nike

我正在开发一个 GeoDjango 应用程序,它使用教程中提供的 WorldBorder 模型。我还创建了我自己的与 WorldBorder 相关联的区域模型。所以一个 WorldBorder/Country 可以有多个区域,其中也有边界(MultiPolygon 字段)。

我使用 DRF 为它制作了 API,但它太慢了,以 GeoJSON 格式加载所有 WorldBorder 和区域需要 16 秒。不过,返回的 JSON 大小为 10MB。这合理吗?

我什至将序列化程序更改为 serpy这比 DRF GIS 快得多序列化程序,但仅提供 10% 的性能改进。

分析后发现,大部分时间花在 GIS 函数上,将数据库中的数据类型转换为坐标列表而不是 WKT。如果我使用 WKT,序列化速度要快得多(1.7 秒与 11.7 秒相比,WKT 仅适用于 WorldBorder MultiPolygon,其他一切仍在 GeoJson 中)

我还尝试使用具有低容差 (0.005) 的 ST_SimplifyVW 压缩 MultiPolygon 以保持准确性,这将 JSON 大小降低到 1.7 MB。这使得总负载为 3.5s。当然,我仍然可以找到平衡精度和速度的最佳公差。

下面是分析数据(简化的MultiPolygon中查询的突然增加是由于错误使用Django QS API来获取ST_SimplifyVW)

enter image description here

编辑:我修复了数据库查询,因此查询调用在 75 个查询时保持不变,正如预期的那样,它不会显着提高性能。

编辑:我继续改进我的数据库查询。我现在将其减少到只有 8 个查询。正如预期的那样,它并没有提高那么多的性能。

enter image description here

下面是函数调用的分析。我突出显示了花费大部分时间的部分。这是使用 Vanilla DRF GIS 实现。 enter image description here

下面是我在没有 ST_SimplifyVW 的情况下将 WKT 用于 MultiPolygon 字段之一的情况。 enter image description here

这是@Udi 要求的模型

class WorldBorderQueryset(models.query.QuerySet):
def simplified(self, tolerance):
sql = "SELECT ST_SimplifyVW(mpoly, %s) AS mpoly"
return self.extra(
select={'mpoly': sql},
select_params=(tolerance,)
)


class WorldBorderManager(models.Manager):
def get_by_natural_key(self, name, iso2):
return self.get(name=name, iso2=iso2)

def get_queryset(self, *args, **kwargs):
qs = WorldBorderQueryset(self.model, using=self._db)
qs = qs.prefetch_related('regions',)
return qs

def simplified(self, level):
return self.get_queryset().simplified(level)


class WorldBorder(TimeStampedModel):
name = models.CharField(max_length=50)
area = models.IntegerField(null=True, blank=True)
pop2005 = models.IntegerField('Population 2005', default=0)
fips = models.CharField('FIPS Code', max_length=2, null=True, blank=True)
iso2 = models.CharField('2 Digit ISO', max_length=2, null=True, blank=True)
iso3 = models.CharField('3 Digit ISO', max_length=3, null=True, blank=True)
un = models.IntegerField('United Nations Code', null=True, blank=True)
region = models.IntegerField('Region Code', null=True, blank=True)
subregion = models.IntegerField('Sub-Region Code', null=True, blank=True)
lon = models.FloatField(null=True, blank=True)
lat = models.FloatField(null=True, blank=True)

# generated from lon lat to be one field so that it can be easily
# edited in admin
center_coordinates = models.PointField(blank=True, null=True)

mpoly = models.MultiPolygonField(help_text='Borders')

objects = WorldBorderManager()

def save(self, *args, **kwargs):
if not self.center_coordinates:
self.center_coordinates = Point(x=self.lon, y=self.lat)
super().save(*args, **kwargs)

def natural_key(self):
return self.name, self.iso2

def __str__(self):
return self.name

class Meta:
verbose_name = 'Country'
verbose_name_plural = 'Countries'
ordering = ('name',)


class Region(TimeStampedModel):
name = models.CharField(max_length=100, unique=True)
country = models.ForeignKey(WorldBorder, related_name='regions')
mpoly = models.MultiPolygonField(help_text='Areas')
center_coordinates = models.PointField()

moment_category = models.ForeignKey('moment.MomentCategory',
blank=True, null=True)

objects = RegionManager()
no_joins = models.Manager()

def natural_key(self):
return (self.name,)

def __str__(self):
return self.name


# TODO might want to have separate table for ActiveCity for performance
# improvement since we have like 50k cities
class City(TimeStampedModel):
country = models.ForeignKey(WorldBorder, on_delete=models.PROTECT,
related_name='cities')
region = models.ForeignKey(Region, blank=True, null=True,
related_name='cities',
on_delete=models.SET_NULL)

name = models.CharField(max_length=255)
accent_city = models.CharField(max_length=255)
population = models.IntegerField(blank=True, null=True)
is_capital = models.BooleanField(default=False)

center_coordinates = models.PointField()

# is active marks that this city is a destination
# only cities with is_active True will be put up to the frontend
is_active = models.BooleanField(default=False)

objects = DefaultSelectOrPrefetchManager(
prefetch_related=(
'yes_moment_beacons__activity__verb',
'social_beacons',
'video_beacons'
),
select_related=('region', 'country')
)
no_joins = models.Manager()

def natural_key(self):
return (self.name,)

def __str__(self):
return self.name

class Meta:
verbose_name_plural = 'Cities'

class Beacon(TimeStampedModel):
# if null defaults to city center coordinates
coordinates = models.PointField(blank=True, null=True)
is_fake = models.BooleanField(default=False)

# can use city here, but the %(class)s gives no space between words
# and it looks ugly

def validate_activity(self):
# activities in the region
activities = self.city.region.moment_category.activities.all()
if self.activity not in activities:
raise ValidationError('Activity is not in the Region')

def clean(self):
self.validate_activity()

def save(self, *args, **kwargs):
# doing a full clean is needed here is to ensure code correctness
# (not user),
# because if someone use objects.create, clean() will never get called,
# cons is validation will be done twice if the object is
# created e.g. from admin
self.full_clean()

if not self.coordinates:
self.coordinates = self.city.center_coordinates
super().save(*args, **kwargs)

class Meta:
abstract = True


class YesMomentBeacon(Beacon):
activity = models.ForeignKey('moment.Activity',
on_delete=models.CASCADE,
related_name='yes_moment_beacons')
# ..........
# other fields

city = models.ForeignKey('world.City', related_name='yes_moment_beacons')

objects = DefaultSelectOrPrefetchManager(
select_related=('activity__verb',)
)

def __str__(self):
return '{} - {}'.format(self.activity, self.coordinates)

# other beacon types.......

这是@Udi 要求的我的序列化器

class RegionInWorldSerializer(GeoFeatureModelSerializer):
yes_moment_beacons = serializers.SerializerMethodField()
social_beacons = serializers.SerializerMethodField()
video_beacons = serializers.SerializerMethodField()

center_coordinates = GeometrySerializerMethodField()

def get_center_coordinates(self, obj):
return obj.center_coordinates

def get_yes_moment_beacons(self, obj):
count = 0

# don't worry, it's already prefetched in the manager
# (including the below methods) so len is used instead of count
cities = obj.cities.all()

for city in cities:
beacons = city.yes_moment_beacons.all()
count += len(beacons)
return count

def get_social_beacons(self, obj):
count = 0

cities = obj.cities.all()

for city in cities:
beacons = city.social_beacons.all()
count += len(beacons)
return count

def get_video_beacons(self, obj):
count = 0

cities = obj.cities.all()

for city in cities:
beacons = city.video_beacons.all()
count += len(beacons)
return count

class Meta:
model = Region
geo_field = 'center_coordinates'
fields = ('name', 'yes_moment_beacons', 'video_beacons',
'social_beacons')


class WorldSerializer(GeoFeatureModelSerializer):
center_coordinates = GeometrySerializerMethodField()

regions = RegionInWorldSerializer(many=True, read_only=True)

def get_center_coordinates(self, obj):
return obj.center_coordinates

class Meta:
model = WorldBorder
geo_field = 'mpoly'

fields = ('name', 'iso2', 'center_coordinates', 'regions')

这是主要查询

def get_queryset(self):
tolerance = self.request.GET.get('tolerance', None)
if tolerance is not None:
tolerance = float(tolerance)
return WorldBorder.objects.simplified(tolerance)
else:
return WorldBorder.objects.all()

这是使用具有高容差的 ST_SimplifyVW 的 API 响应的一部分(236 个对象中的 1 个)。如果我不使用它,Firefox 会挂起,因为我认为它无法处理 10 MB 的 JSON。与其他国家相比,这个特定国家的边界​​数据很小。由于 ST_SimplifyVW,此处返回的 JSON 从 10MB 压缩到 750kb。即使只有 750KB 的 JSON,在我的本地机器上也需要 4.5 秒。

{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"coordinates": [
[
[
[
74.915741,
37.237328
],
[
74.400543,
37.138962
],
[
74.038315,
36.814682
],
[
73.668304,
36.909637
],
[
72.556641,
36.821266
],
[
71.581131,
36.346443
],
[
71.18779,
36.039444
],
[
71.647766,
35.419991
],
[
71.496094,
34.959435
],
[
70.978592,
34.504997
],
[
71.077209,
34.052216
],
[
70.472214,
33.944153
],
[
70.002777,
34.052773
],
[
70.323318,
33.327774
],
[
69.561096,
33.08194
],
[
69.287491,
32.526382
],
[
69.328247,
31.940365
],
[
69.013885,
31.648884
],
[
68.161102,
31.830276
],
[
67.575546,
31.53194
],
[
67.778046,
31.332218
],
[
66.727768,
31.214996
],
[
66.395538,
30.94083
],
[
66.256653,
29.85194
],
[
65.034149,
29.541107
],
[
64.059143,
29.41444
],
[
63.587212,
29.503887
],
[
62.484436,
29.406105
],
[
60.868599,
29.863884
],
[
61.758331,
30.790276
],
[
61.713608,
31.383331
],
[
60.85305,
31.494995
],
[
60.858887,
32.217209
],
[
60.582497,
33.066101
],
[
60.886383,
33.557213
],
[
60.533882,
33.635826
],
[
60.508331,
34.140274
],
[
60.878876,
34.319717
],
[
61.289162,
35.626381
],
[
62.029716,
35.448601
],
[
62.309158,
35.141663
],
[
63.091934,
35.432495
],
[
63.131378,
35.865273
],
[
63.986107,
36.038048
],
[
64.473877,
36.255554
],
[
64.823044,
37.138603
],
[
65.517487,
37.247215
],
[
65.771927,
37.537498
],
[
66.302765,
37.323608
],
[
67.004166,
37.38221
],
[
67.229431,
37.191933
],
[
67.765823,
37.215546
],
[
68.001389,
36.936104
],
[
68.664154,
37.274994
],
[
69.246643,
37.094154
],
[
69.515823,
37.580826
],
[
70.134995,
37.529045
],
[
70.165543,
37.871719
],
[
70.71138,
38.409866
],
[
70.97998,
38.470459
],
[
71.591934,
37.902618
],
[
71.429428,
37.075829
],
[
71.842758,
36.692101
],
[
72.658508,
37.021202
],
[
73.307205,
37.462753
],
[
73.819717,
37.228058
],
[
74.247208,
37.409546
],
[
74.915741,
37.237328
]
]
]
],
"type": "MultiPolygon"
},
"properties": {
"name": "Afghanistan",
"iso2": "AF",
"center_coordinates": {
"coordinates": [
65.216,
33.677
],
"type": "Point"
},
"regions": {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"coordinates": [
66.75292967820785,
34.52466146754814
],
"type": "Point"
},
"properties": {
"name": "Central Afghanistan",
"yes_moment_beacons": 0,
"video_beacons": 0,
"social_beacons": 0
}
},
{
"type": "Feature",
"geometry": {
"coordinates": [
69.69726561529792,
35.96022296494905
],
"type": "Point"
},
"properties": {
"name": "Northern Highlands",
"yes_moment_beacons": 0,
"video_beacons": 0,
"social_beacons": 0
}
},
{
"type": "Feature",
"geometry": {
"coordinates": [
63.89541422401191,
32.27442932956255
],
"type": "Point"
},
"properties": {
"name": "Southwestern Afghanistan",
"yes_moment_beacons": 0,
"video_beacons": 0,
"social_beacons": 0
}
}
]
}
}
},
........
}

所以这里的重点是,GeoDjango 没有我预期的那么快,还是性能数据符合预期?我可以做些什么来提高性能,同时仍然输出 GeoJSON,即不是 WKT。微调公差是唯一的办法吗?不过,我也可能会分离用于获取区域的端点。

最佳答案

由于您的地理数据不会经常更改,请尝试在预先计算的 geojson 中缓存所有地区/国家多边形。即,使用该国家所有地区的地理数据创建一个 /country/123.geojson API 调用或静态文件,可能会提前进行简化。

您的其他 API 调用应该只返回数字数据,没有地理多边形,将组合任务留给客户端。

关于python - 如何在序列化大量 GeoDjango 几何字段时进行性能优化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48040545/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com