gpt4 book ai didi

python - 有没有比 for 循环和 if 语句更快的方法来查找到 python 中另一个点的最近点?

转载 作者:行者123 更新时间:2023-12-04 09:01:08 24 4
gpt4 key购买 nike

有没有更快的方法(在 Python 中,使用 CPU)来做与下面的函数相同的事情?我使用过 For 循环和 if 语句,想知道是否有更快的方法?目前每 100 个邮政编码大约需要 1 分钟来运行此功能,而我有大约 70,000 个要通过。
使用的2个数据帧是:postcode_df 包含 71,092 行和列:

  • 邮政编码“BL4 7PD”
  • 纬度53.577653
  • 经度例如-2.434136

  • 例如
    postcode_df = pd.DataFrame({"Postcode":["SK12 2LH", "SK7 6LQ"],
    "Latitude":[53.362549, 53.373812],
    "Longitude":[-2.061329, -2.120956]})

    air 包含 421 行和列:
  • TubeRef 例如“ABC01”
  • 纬度53.55108
  • 经度例如-2.396236

  • 例如
    air = pd.DataFrame({"TubeRef":["Stkprt35", "Stkprt07", "Stkprt33"],
    "Latitude":[53.365085, 53.379502, 53.407510],
    "Longitude":[-2.0763, -2.120777, -2.145632]})
    该函数循环遍历 postcode_df 中的每个邮政编码,并且对于每个邮政编码循环遍历每个 TubeRef 并计算(使用 geopy )它们之间的距离并保存与邮政编码距离最短的 TubeRef。
    输出 df postcode_nearest_tube_refs 包含每个邮政编码最近的管并包含列:
  • 邮政编码“BL4 7PD”
  • 最近的空气管"ABC01
  • 到空气管 KM 的距离,例如1.035848
  • # define function to get nearest air quality monitoring tube per postcode
    def get_nearest_tubes(constituency_list):

    postcodes = []
    nearest_tubes = []
    distances_to_tubes = []

    for postcode in postcode_df["Postcode"]:
    closest_tube = ""
    shortest_dist = 500

    postcode_lat = postcode_df.loc[postcode_df["Postcode"]==postcode, "Latitude"]
    postcode_long = postcode_df.loc[postcode_df["Postcode"]==postcode, "Longitude"]
    postcode_coord = (float(postcode_lat), float(postcode_long))


    for tuberef in air["TubeRef"]:
    tube_lat = air.loc[air["TubeRef"]==tuberef, "Latitude"]
    tube_long = air.loc[air["TubeRef"]==tuberef, "Longitude"]
    tube_coord = (float(tube_lat), float(tube_long))

    # calculate distance between postcode and tube
    dist_to_tube = geopy.distance.distance(postcode_coord, tube_coord).km
    if dist_to_tube < shortest_dist:
    shortest_dist = dist_to_tube
    closest_tube = str(tuberef)

    # save postcode's tuberef with shortest distance
    postcodes.append(str(postcode))
    nearest_tubes.append(str(closest_tube))
    distances_to_tubes.append(shortest_dist)

    # create dataframe of the postcodes, nearest tuberefs and distance
    postcode_nearest_tube_refs = pd.DataFrame({"Postcode":postcodes,
    "Nearest Air Tube":nearest_tubes,
    "Distance to Air Tube KM": distances_to_tubes})

    return postcode_nearest_tube_refs
    我正在使用的库是:
    import numpy as np
    import pandas as pd
    # !pip install geopy
    import geopy.distance

    最佳答案

    这里是一个工作示例,需要几秒钟(<10)。
    导入库

    import pandas as pd
    import numpy as np
    from sklearn.neighbors import BallTree
    import uuid
    我生成了一些随机数据,这也需要一秒钟,但至少我们有一些实际的数量。
    np_rand_post = 5 * np.random.random((72000,2))
    np_rand_post = np_rand_post + np.array((53.577653, -2.434136))
    并使用 UUID 伪造邮政编码
    postcode_df = pd.DataFrame( np_rand_post , columns=['lat', 'long'])
    postcode_df['postcode'] = [uuid.uuid4().hex[:6] for _ in range(72000)]
    postcode_df.head()
    我们对空气也这样做
    np_rand = 5 * np.random.random((500,2))
    np_rand = np_rand + np.array((53.55108, -2.396236))
    并再次将 uuid 用于假引用
    tube_df = pd.DataFrame( np_rand , columns=['lat', 'long'])
    tube_df['ref'] = [uuid.uuid4().hex[:5] for _ in range(500)]
    tube_df.head()
    将 GPS 值提取为 numpy
    postcode_gps = postcode_df[["lat", "long"]].values
    air_gps = tube_df[["lat", "long"]].values
    创建一个球树
    postal_radians =  np.radians(postcode_gps)
    air_radians = np.radians(air_gps)

    tree = BallTree(air_radians, leaf_size=15, metric='haversine')
    首先查询最近的
    distance, index = tree.query(postal_radians, k=1)
    注意距离不是KM,需要先换算。
    earth_radius = 6371000
    distance_in_meters = distance * earth_radius
    distance_in_meters
    例如,使用 tube_df.ref[ index[:,0] ] 获取 ref

    关于python - 有没有比 for 循环和 if 语句更快的方法来查找到 python 中另一个点的最近点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63557801/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com