gpt4 book ai didi

Python:使用多处理模块作为可能的解决方案来提高我的功能的速度

转载 作者:太空宇宙 更新时间:2023-11-04 07:22:35 24 4
gpt4 key购买 nike

我在 Python 2.7(在 Window OS 64 位)中编写了一个函数,以计算 ESRI shapefile format 中引用多边形 (Ref) 和一个或多个分段 (Seg) 多边形的交叉区域的平均值.代码非常慢,因为我有超过 2000 个引用多边形,并且对于每个 Ref_polygon,函数每次都运行所有 Seg 多边形(超过 7000 个)。很抱歉,该功能是一个原型(prototype)。

我想知道 multiprocessing 是否可以帮助我提高循环速度或者是否有更多性能解决方案。如果多处理可以成为一个可能的解决方案,我希望知道优化我的以下功能的最佳方法

import numpy as np
import ogr
import osr,gdal
from shapely.geometry import Polygon
from shapely.geometry import Point
import osgeo.gdal
import osgeo.gdal as gdal

def AreaInter(reference,segmented,outFile):
# open shapefile
ref = osgeo.ogr.Open(reference)
if ref is None:
raise SystemExit('Unable to open %s' % reference)
seg = osgeo.ogr.Open(segmented)
if seg is None:
raise SystemExit('Unable to open %s' % segmented)
ref_layer = ref.GetLayer()
seg_layer = seg.GetLayer()
# create outfile
if not os.path.split(outFile)[0]:
file_path, file_name_ext = os.path.split(os.path.abspath(reference))
outFile_filename = os.path.splitext(os.path.basename(outFile))[0]
file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w")
else:
file_path_name, file_ext = os.path.splitext(outFile)
file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w")
# For each reference objects-i
for index in xrange(ref_layer.GetFeatureCount()):
ref_feature = ref_layer.GetFeature(index)
# get FID (=Feature ID)
FID = str(ref_feature.GetFID())
ref_geometry = ref_feature.GetGeometryRef()
pts = ref_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
# convert in a shapely polygon
ref_polygon = Polygon(points)
# get the area
ref_Area = ref_polygon.area
# create an empty list
Area_seg, Area_intersect = ([] for _ in range(2))
# For each segmented objects-j
for segment in xrange(seg_layer.GetFeatureCount()):
seg_feature = seg_layer.GetFeature(segment)
seg_geometry = seg_feature.GetGeometryRef()
pts = seg_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
seg_polygon = Polygon(points)
seg_Area.append = seg_polygon.area
# intersection (overlap) of reference object with the segmented object
intersect_polygon = ref_polygon.intersection(seg_polygon)
# area of intersection (= 0, No intersection)
intersect_Area.append = intersect_polygon.area
# Avarage for all segmented objects (because 1 or more segmented polygons can intersect with reference polygon)
seg_Area_average = numpy.average(seg_Area)
intersect_Area_average = numpy.average(intersect_Area)
file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
file_out.close()

最佳答案

您可以使用 multiprocessing包,尤其是 Pool 类。首先创建一个函数,它可以在 for 循环中完成所有您想做的事情,并且仅将索引作为参数:

def process_reference_object(index):
ref_feature = ref_layer.GetFeature(index)
# all your code goes here
return (" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")

请注意 这不会写入文件本身 - 这会很困惑,因为您会有多个进程同时写入同一个文件。相反,它返回需要写入的字符串。另请注意,此函数中有一些对象,如 ref_layerref_geometry 需要以某种方式到达它 - 这取决于你如何去做(你可以把 process_reference_object 作为用它们初始化的类中的方法,或者它可能像全局定义它们一样丑陋。

然后,您创建一个进程资源池,并使用 Pool.imap_unordered 运行所有索引(它本身会根据需要将每个索引分配给不同的进程):

from multiprocessing import Pool
p = Pool() # run multiple processes
for l in p.imap_unordered(process_reference_object, range(ref_layer.GetFeatureCount())):
file_out.write(l)

这将跨多个进程并行独立处理您的引用对象,并将它们写入文件(以任意顺序,注意)。

关于Python:使用多处理模块作为可能的解决方案来提高我的功能的速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14202285/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com