gpt4 book ai didi

python - 带有 pandas 数据框的矢量化半正矢公式

转载 作者:行者123 更新时间:2023-12-02 01:54:53 27 4
gpt4 key购买 nike

我知道要找到两个纬度、经度点之间的距离,我需要使用半正矢函数:

def haversine(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km

我有一个 DataFrame,其中一列是纬度,另一列是经度。我想找出这些点与设定点 -56.7213600、37.2175900 的距离。如何从 DataFrame 中获取值并将其放入函数中?

示例数据框:

     SEAZ     LAT          LON
1 296.40, 58.7312210, 28.3774110
2 274.72, 56.8148320, 31.2923240
3 192.25, 52.0649880, 35.8018640
4 34.34, 68.8188750, 67.1933670
5 271.05, 56.6699880, 31.6880620
6 131.88, 48.5546220, 49.7827730
7 350.71, 64.7742720, 31.3953780
8 214.44, 53.5192920, 33.8458560
9 1.46, 67.9433740, 38.4842520
10 273.55, 53.3437310, 4.4716664

最佳答案

我无法确认计算是否正确,但以下方法有效:

In [11]:

from numpy import cos, sin, arcsin, sqrt
from math import radians

def haversine(row):
lon1 = -56.7213600
lat1 = 37.2175900
lon2 = row['LON']
lat2 = row['LAT']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km

df['distance'] = df.apply(lambda row: haversine(row), axis=1)
df
Out[11]:
SEAZ LAT LON distance
index
1 296.40 58.731221 28.377411 6275.791920
2 274.72 56.814832 31.292324 6509.727368
3 192.25 52.064988 35.801864 6990.144378
4 34.34 68.818875 67.193367 7357.221846
5 271.05 56.669988 31.688062 6538.047542
6 131.88 48.554622 49.782773 8036.968198
7 350.71 64.774272 31.395378 6229.733699
8 214.44 53.519292 33.845856 6801.670843
9 1.46 67.943374 38.484252 6418.754323
10 273.55 53.343731 4.471666 4935.394528

以下代码在如此小的数据帧上实际上速度较慢,但​​我将其应用于 100,000 行 df:

In [35]:

%%timeit
df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LON'])
df['dLON'] = df['LON_rad'] - math.radians(-56.7213600)
df['dLAT'] = df['LAT_rad'] - math.radians(37.2175900)
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

1 loops, best of 3: 17.2 ms per loop

与需要 4.3 秒的 apply 函数相比,快了近 250 倍,以后需要注意

如果我们将以上所有内容压缩为一行:

In [39]:

%timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['LAT']) - math.radians(37.2175900))/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(np.radians(df['LAT'])) * np.sin((np.radians(df['LON']) - math.radians(-56.7213600))/2)**2))
100 loops, best of 3: 12.6 ms per loop

我们观察到速度进一步加快,速度提高了约 341 倍。

关于python - 带有 pandas 数据框的矢量化半正矢公式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25767596/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com