gpt4 book ai didi

python - 在 Pandas 数据框中相互获取最近点

转载 作者:行者123 更新时间:2023-12-04 16:28:29 28 4
gpt4 key购买 nike

我有一个数据框:

  routeId  latitude_value  longitude_value
r1 28.210216 22.813209
r2 28.216103 22.496735
r3 28.161786 22.842318
r4 28.093110 22.807081
r5 28.220370 22.503500
r6 28.220370 22.503500
r7 28.220370 22.503500

从这我想生成一个数据帧 df2 像这样的东西:
routeId    nearest
r1 r3 (for example)
r2 ... similarly for all the routes.

我正在尝试实现的逻辑是

对于每条路线,我应该找到所有其他路线的欧几里得距离。
并在routeId上进行迭代。

有一个用于计算欧式距离的函数。
dist = math.hypot(x2 - x1, y2 - y1)

但是我对如何构建传递数据帧或使用.apply()的函数感到困惑
def  get_nearest_route():
.....
return df2

最佳答案

我们可以使用 scipy.spatial.distance.cdist 或多个for循环,然后将min替换为路由并找到最接近的i

mat = scipy.spatial.distance.cdist(df[['latitude_value','longitude_value']], 
df[['latitude_value','longitude_value']], metric='euclidean')

# If you dont want scipy, you can use plain python like
# import math
# mat = []
# for i,j in zip(df['latitude_value'],df['longitude_value']):
# k = []
# for l,m in zip(df['latitude_value'],df['longitude_value']):
# k.append(math.hypot(i - l, j - m))
# mat.append(k)
# mat = np.array(mat)

new_df = pd.DataFrame(mat, index=df['routeId'], columns=df['routeId'])
new_df的输出
routeId        r1        r2        r3        r4        r5        r6        r7
routeId
r1 0.000000 0.316529 0.056505 0.117266 0.309875 0.309875 0.309875
r2 0.316529 0.000000 0.349826 0.333829 0.007998 0.007998 0.007998
r3 0.056505 0.349826 0.000000 0.077188 0.343845 0.343845 0.343845
r4 0.117266 0.333829 0.077188 0.000000 0.329176 0.329176 0.329176
r5 0.309875 0.007998 0.343845 0.329176 0.000000 0.000000 0.000000
r6 0.309875 0.007998 0.343845 0.329176 0.000000 0.000000 0.000000
r7 0.309875 0.007998 0.343845 0.329176 0.000000 0.000000 0.000000

#Replace minimum distance with column name and not the minimum with `False`.
# new_df[new_df != 0].min(),0). This gives a mask matching minimum other than zero.
closest = np.where(new_df.eq(new_df[new_df != 0].min(),0),new_df.columns,False)

# Remove false from the array and get the column names as list .
df['close'] = [i[i.astype(bool)].tolist() for i in closest]


routeId latitude_value longitude_value close
0 r1 28.210216 22.813209 [r3]
1 r2 28.216103 22.496735 [r5, r6, r7]
2 r3 28.161786 22.842318 [r1]
3 r4 28.093110 22.807081 [r3]
4 r5 28.220370 22.503500 [r2]
5 r6 28.220370 22.503500 [r2]
6 r7 28.220370 22.503500 [r2]

如果您不想忽略零,那么
# Store the array values in a variable
arr = new_df.values
# We dont want to find mimimum to be same point, so replace diagonal by nan
arr[np.diag_indices_from(new_df)] = np.nan

# Replace the non nan min with column name and otherwise with false
new_close = np.where(arr == np.nanmin(arr, axis=1)[:,None],new_df.columns,False)

# Get column names ignoring false.
df['close'] = [i[i.astype(bool)].tolist() for i in new_close]

routeId latitude_value longitude_value close
0 r1 28.210216 22.813209 [r3]
1 r2 28.216103 22.496735 [r5, r6, r7]
2 r3 28.161786 22.842318 [r1]
3 r4 28.093110 22.807081 [r3]
4 r5 28.220370 22.503500 [r6, r7]
5 r6 28.220370 22.503500 [r5, r7]
6 r7 28.220370 22.503500 [r5, r6]

关于python - 在 Pandas 数据框中相互获取最近点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47534715/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com