gpt4 book ai didi

python - 快速半正弦逼近(Python/Pandas)

转载 作者:IT老高 更新时间:2023-10-28 22:09:50 25 4
gpt4 key购买 nike

Pandas 数据框中的每一行都包含 2 个点的 lat/lng 坐标。使用下面的 Python 代码,为许多(数百万)行计算这两个点之间的距离需要很长时间!

考虑到2个点相距不到50英里,精度不是很重要,是否可以让计算更快?

from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km


for index, row in df.iterrows():
df.loc[index, 'distance'] = haversine(row['a_longitude'], row['a_latitude'], row['b_longitude'], row['b_latitude'])

最佳答案

这是同一函数的向量化 numpy 版本:

import numpy as np

def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)

All args must be of equal length.

"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

dlon = lon2 - lon1
dlat = lat2 - lat1

a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km

输入都是值数组,它应该能够立即完成数百万个点。要求是输入是 ndarrays,但你的 pandas 表的列可以工作。

例如,随机生成的值:

>>> import numpy as np
>>> import pandas
>>> lon1, lon2, lat1, lat2 = np.random.randn(4, 1000000)
>>> df = pandas.DataFrame(data={'lon1':lon1,'lon2':lon2,'lat1':lat1,'lat2':lat2})
>>> km = haversine_np(df['lon1'],df['lat1'],df['lon2'],df['lat2'])

或者如果你想创建另一个列:

>>> df['distance'] = haversine_np(df['lon1'],df['lat1'],df['lon2'],df['lat2'])

在 python 中循环遍历数据数组非常慢。 Numpy 提供了对整个数据数组进行操作的函数,可以避免循环并显着提高性能。

这是 vectorization 的示例.

关于python - 快速半正弦逼近(Python/Pandas),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29545704/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com