gpt4 book ai didi

python - 基于集中度结合 GeoPandas Dataframe 和 Pandas Dataframe

转载 作者:太空宇宙 更新时间:2023-11-03 21:31:28 25 4
gpt4 key购买 nike

我有两个想要比较的空间数据集。其中之一包含多边形坐标,但未转换为地理点。该数据集大约有 700,000 个点。另一种只有坐标的经纬度,没有GIS点数据。该数据集有 600 万个数据点。我还有一个 shapefile,我将其转换为 Geopandas 数据框。它包含我正在研究的城市的所有社区。我正在尝试研究每个邻域中点的集中度。为此,我在地理数据框中添加了一个额外的列,并将其中的值设置为 0。然后,我循环遍历第一个数据集中的所有点,并使用 Geopanda 的多边形点算法来查找该点包含在哪个多边形中并增加该行的新列值。然而,这花了一整天的时间(所以这个解决方案对于 600 万点数据集来说已经不太可行)并且不起作用。

(代码如下)

您建议我如何加快代码速度或最有效地做到这一点?

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
import shapely.speedups

map_df = gpd.read_file("city.shp")
first_dataset = pd.read_csv(r'first_dataset.csv', header=0) # I have both datasets as csvs
map_df["data_count"] = 0
shapely.speedups.enable()
for index, rows in first_dataset.iterrows():
for index_m, rows_m in map_df.iterrows():

if rows_m['geometry'].contains(Point(float(rows['x_sp'].replace(',', '')), float(rows['y_sp'].replace(',', '')))):
rows_m["data_count"]+= 1
print(rows_m)
if index % 10000 == 0:
print(index)

数据集 1 中的数据帧行示例:

id       created_at       latitude       longitude       x_sp            y_sp
0 8/27/2015 40.723092 -73.844215 1,027,431.148 202,756.7687
1 09/03/2015 40.794111 -73.818679 1,034,455.701 228,644.8374
2 09/05/2015 40.717581 -73.936608 1,001,822.831 200,716.8913

数据集 2 中的数据帧行示例:

id       created_at       latitude       longitude
0 8/27/2015 40.723092 -73.844215
1 09/03/2015 40.794111 -73.818679
2 09/05/2015 40.717581 -73.936608

地理数据帧中的日期帧行示例:

0    POLYGON ((990897.9000244141 169268.1207885742, 990588.2515869141 169075.217590332, 990634.5867919922 168756.3862304688, 990675.9777832031 168471.1604003906, 990718.684387207 168203.7496337891, 990751.5944213867 167919.0817871094, 990769.1470336914 167817.549987793, 990787.1948242188 167713.2717895508, 990847.0239868164 167337.3375854492, 990561.9622192383 167292.2391967773, 990580.0333862305 167018.7095947266, 990604.2161865234 166962.7120361328, 990650.3184204102 166636.4661865234, 990680.6791992188 166428.1857910156, 990703.641784668 166257.7855834961, 990755.4462280273 165886.7659912109, 990648.6395874023 165753.7963867188, 990620.1052246094 165718.2723999023, 991080.934387207 165440.5859985352, 990684.0062255859 164677.5101928711, 990490.1318359375 164313.4031982422, 990399.3641967773 164171.6812133789, 991015.9805908203 163682.8403930664, 991087.1251831055 163613.3948364258, 991067.8176269531 163744.3165893555, 991057.1912231445 163816.3900146484, 991345.2709960938 163856.4456176758, 991646.3432006836 163900.2806396484, 991773.0786132813 163030.0364379883, 991470.4874267578 162985.1055908203, 991143.1979980469 163237.9592285156, 991198.5916137695 162853.2189941406, 991220.766784668 162677.7142333984, 991253.1356201172 162478.1116333008, 991133.2471923828 162567.9869995117, 990522.3430175781 163053.4155883789, 989903.0823974609 163543.8049926758, 989791.7080078125 163632.0856323242, 989558.5064086914 163816.9310302734, 989283.8712158203 164034.6274414063, 988673.0786132813 164518.5571899414, 988060.2022094727 165001.5054321289, 988221.6488037109 165207.3384399414, 987622.6669921875 165682.1539916992, 987000.0469970703 166175.3447875977, 986318.1618041992 166716.4169921875, 985920.5576171875 167032.3049926758, 985825.3380126953 167105.7139892578, 985667.7103881836 167149.0173950195, 985711.5521850586 167204.7424316406, 985735.1807861328 167234.7784423828, 985616.2537841797 167274.6865844727, 985141.4716186523 167646.0275878906, 985116.6848144531 167538.4523925781, 985078.2302246094 167367.5209960938, 985015.9666137695 167085.9205932617, 984772.9385986328 167277.9213867188, 984162.9234008789 167762.0607910156, 983552.0159912109 168246.4891967773, 983631.5764160156 168347.3909912109, 983714.2465820313 168449.9827880859, 983400.1986083984 168698.6110229492, 983226.8787841797 168835.8251953125, 983101.6134033203 168934.9949951172, 982490.5974121094 169419.4625854492, 982300.666809082 169571.1713867188, 982360.6928100586 169622.3165893555, 982412.7750244141 169667.8779907227, 982499.6558227539 169743.8810424805, 982705.4100341797 169925.6982421875, 982208.206237793 170319.3721923828, 982376.1818237305 170531.090637207, 982468.7208251953 170647.6915893555, 982537.7200317383 170734.6337890625, 982699.583190918 170939.6395874023, 982861.3120117188 171143.2908325195, 983023.1260375977 171347.2145996094, 983109.014831543 171455.7706298828, 983184.8184204102 171551.5745849609, 983347.2426147461 171754.9827880859, 983508.3842163086 171960.0018310547, 983592.399597168 172065.8973999023, 983669.7454223633 172163.3807983398, 983831.5626220703 172367.2941894531, 983993.1697998047 172571.6575927734, 984066.6146240234 172663.9800415039, 984155.3021850586 172775.4661865234, 984317.4638061523 172980.1575927734, 984478.7760009766 173183.559387207, 985089.9163818359 172698.5303955078, 985496.0212402344 172376.3232421875, 985694.4296264648 172550.7814331055, 985823.2385864258 172664.3868408203, 985893.815612793 172726.3685913086, 986092.0632324219 172900.3115844727, 986291.3049926758 173075.3837890625, 986490.1168212891 173249.0444335938, 986688.6240234375 173424.299987793, 986887.4290161133 173598.6697998047, 987086.5100097656 173773.731628418, 987286.3489990234 173946.5759887695, 987483.0447998047 174107.9993896484, 987719.0842285156 173919.7717895508, 987932.4072265625 173748.016784668, 988163.9849853516 173563.5582275391, 988386.3762207031 173383.4208374023, 988527.3765869141 173270.1207885742, 988606.2772216797 173205.4880371094, 988880.1351928711 172984.799987793, 988969.5344238281 172928.8251953125, 989122.0604248047 173006.6860351563, 989233.424987793 173068.2640380859, 989458.4180297852 173191.2103881836, 989683.6900024414 173315.2449951172, 989779.9464111328 172642.8909912109, 989798.666809082 172522.3532104492, 989825.658203125 172332.0057983398, 989835.7268066406 172270.7457885742, 989890.3657836914 171886.9739990234, 989924.4572143555 171651.7941894531, 989946.274597168 171510.4783935547, 989971.1500244141 171328.6119995117, 989999.0960083008 171139.9083862305, 990047.4818115234 170785.1354370117, 990350.0830078125 170826.1293945313, 990664.5280151367 170870.7161865234, 990734.6127929688 170389.4440307617, 990758.5037841797 170225.3895874023, 990791.0989990234 170001.5607910156, 990897.9000244141 169268.1207885742))
1 POLYGON ((1038593.459228516 221913.3550415039, 1039369.281188965 221834.5889892578, 1040016.937194824 221767.3710327148, 1040050.687194824 221763.8671875, 1040133.272399902 221639.3005981445, 1040238.59362793 221481.9191894531, 1040275.303039551 221429.7963867188, 1040316.041015625 221380.5115966797, 1040360.440612793 221334.5405883789, 1040360.463439941 221334.5176391602, 1040360.48638916 221334.4979858398, 1040392.428588867 221306.1614379883, 1040408.153625488 221292.2112426758, 1040551.463806152 221188.8290405273, 1040607.487182617 221150.3577880859, 1040831.604187012 220992.8909912109, 1040848.265991211 220980.7042236328, 1040970.862792969 220849.400390625, 1041010.750793457 220801.116394043, 1041048.40838623 220751.07421875, 1041084.792785645 220697.3634033203, 1041264.418395996 220427.8726196289, 1041322.846984863 220337.4133911133, 1041520.730224609 220046.5382080078, 1041536.760437012 220023.0607910156, 1041527.849609375 219998.7427978516, 1041471.048583984 219365.9125976563, 1041419.457397461 218864.266784668, 1041325.684631348 217942.9957885742, 1041584.375 217916.5983886719, 1041530.431640625 217377.5538330078, 1041497.744628906 217069.0079956055, 1041473.059631348 216814.458190918, 1041472.780822754 216784.1069946289, 1041471.678405762 216664.5338134766, 1041472.728210449 216532.3522338867, 1041472.895629883 216511.0759887695, 1041718.423400879 216539.4061889648, 1041967.330383301 216566.1448364258, 1042215.098815918 216593.5889892578, 1042278.983215332 216006.1691894531, 1042341.099182129 215441.750793457, 1042090.027038574 215432.1575927734, 1041871.126586914 215424.8546142578, 1041840.965820313 215423.8474121094, 1041588.99798584 215415.458190918, 1041313.476623535 215407.4401855469, 1041061.948242188 215429.4609985352, 1040820.14440918 215535.7139892578, 1040705.075622559 215590.2645874023, 1040570.450012207 215654.0862426758, 1040321.418212891 215772.4523925781, 1040072.088012695 215890.9625854492, 1039822.682434082 216009.2297973633, 1039573.237426758 216119.8107910156, 1039310.012817383 216165.0956420898, 1039059.793640137 216222.4512329102, 1038804.708984375 216272.5297851563, 1038547.944213867 216324.5051879883, 1038293.778015137 216385.8469848633, 1038037.837036133 216449.0289916992, 1037782.71282959 216511.2695922852, 1037527.778991699 216573.4415893555, 1037274.272216797 216635.7349853516, 1037019.584411621 216696.1871948242, 1036761.95703125 216723.8743896484, 1036594.0078125 216737.932434082, 1036256.774414063 216632.6439819336, 1036017.460632324 216547.7261962891, 1035777.588989258 216462.2112426758, 1035476.270629883 216354.6691894531, 1034867.013427734 216136.2247924805, 1033940.263183594 215805.2145996094, 1033892.861816406 215830.4443969727, 1033793.351013184 216101.4118041992, 1033683.557800293 216410.233215332, 1033625.257385254 216572.4212036133, 1033613.161010742 216606.0693969727, 1033539.942199707 216807.2940063477, 1033392.729797363 217220.2963867188, 1033274.84161377 217550.5029907227, 1033147.43359375 217909.2789916992, 1033078.946411133 218336.7681884766, 1033016.190612793 218723.4501953125, 1032942.762023926 219190.3881835938, 1032877.883605957 219596.958984375, 1032814.523986816 220005.6489868164, 1032566.499816895 219959.740234375, 1032432.372619629 219933.1885986328, 1032291.398620605 219906.0463867188, 1032288.993591309 220089.8088378906, 1032285.696594238 220450.083984375, 1032285.26361084 220710.5983886719, 1032311.743225098 220991.5687866211, 1032372.694396973 221503.2936401367, 1032488.727600098 222011.6292114258, 1032754.957397461 222240.6279907227, 1032903.536621094 222314.0530395508, 1032916.489196777 222404.0297851563, 1032933.516784668 222493.4227905273, 1032954.572998047 222581.9495849609, 1032979.589416504 222669.3442382813, 1033043.582214355 222903.4877929688, 1033059.717224121 222956.1351928711, 1033119.047790527 223149.7109985352, 1033183.440795898 223347.7189941406, 1033480.638183594 223316.1047973633, 1033734.718994141 223287.1185913086, 1034012.103637695 223260.5145874023, 1034153.652038574 223247.0729980469, 1034292.312988281 223233.9039916992, 1034854.191833496 223178.7626342773, 1035616.388427734 223101.1807861328, 1035580.289428711 222742.7860107422, 1035554.305236816 222487.2645874023, 1035546.32623291 222411.5527954102, 1035527.251586914 222225.7689819336, 1036129.015441895 222163.9714355469, 1037024.84362793 222073.1019897461, 1037817.207641602 221992.4197998047, 1038593.459228516 221913.3550415039))
2 POLYGON ((1022728.275024414 217530.8082275391, 1023052.64440918 216997.8765869141, 1023125.596984863 216889.2808227539, 1023273.037597656 216713.3721923828, 1023276.361022949 216661.2990112305, 1023320.054843883 216618.8505990705, 1023365.11109836 216577.851189134, 1023411.481757777 216538.3444855517, 1023459.117392412 216500.3726012749, 1023507.967224121 216463.9760131836, 1023557.979221938 216429.193600211, 1023609.100029511 216396.0623579916, 1023661.275153702 216364.6176033644, 1023714.448977505 216334.8928554269, 1023768.564819336 216306.9197998047, 1023946.881408691 216213.6618041992, 1023987.648620605 216191.0289916992, 1024036.671203613 216163.8128051758, 1024131.621826172 216098.0781860352, 1024206.487182617 216029.908996582, 1024272.221984863 215958.6986083984, 1024350.384399414 215887.7606201172, 1024401.58581543 215813.4290161133, 1024417.641784668 215790.1193847656, 1024520.187194824 215639.7949829102, 1024536.922790527 215611.208984375, 1024560.20098877 215571.5419921875, 1024523.26739502 215581.7974243164, 1024484.575622559 215589.4478149414, 1024445.806396484 215592.4146118164, 1024405.450195313 215590.1334228516, 1024336.880615234 215576.3468017578, 1024308.688415527 215593.5944213867, 1024252.044799805 215580.7860107422, 1023879.900024414 215517.4067993164, 1023657.876220703 215478.4797973633, 1023434.129821777 215436.2781982422, 1023343.286987305 215431.4096069336, 1023170.426391602 215423.5223999023, 1023055.702026367 215418.0599975586, 1022907.730224609 215410.4024047852, 1022758.914794922 215403.3388061523, 1022507.560424805 215391.9639892578, 1022258.003601074 215378.4998168945, 1022006.481994629 215365.9700317383, 1021755.898193359 215353.9393920898, 1021463.710632324 215330.1235961914, 1021215.643432617 215295.4747924805, 1020968.189819336 215260.8588256836, 1020685.559387207 215221.5936279297, 1020403.20098877 215182.217590332, 1020150.986816406 215147.0106201172, 1019898.86138916 215111.6365966797, 1019651.066589355 215077.348815918, 1019403.996826172 215043.3461914063, 1019252.57019043 215025.8591918945, 1019100.007995605 214999.4357910156, 1018842.210021973 214963.4545898438, 1018738.072998047 215708.6661987305, 1018655.300842285 216301.7557983398, 1018597.322021484 216715.5181884766, 1018742.200622559 216682.1550292969, 1018879.417785645 216650.5541992188, 1018789.959411621 217262.7443847656, 1018788.440429688 217285.70703125, 1018786.898620605 217303.4104003906, 1018728.470031738 217731.0209960938, 1018636.131225586 218394.3923950195, 1018559.18560791 218993.1937866211, 1018533.155395508 219135.4047851563, 1018530.182983398 219150.8411865234, 1018528.234191895 219164.5944213867, 1018489.773010254 219468.2059936523, 1018592.882995605 219766.1644287109, 1018590.461791992 219946.5151977539, 1018593.864013672 220008.2933959961, 1018602.453186035 220164.2609863281, 1018603.476806641 220212.774597168, 1018600.593017578 220261.3671875, 1018595.871398926 220294.948425293, 1018593.814819336 220309.5756225586, 1018582.728820801 220358.7814331055, 1018571.127807617 220392.7609863281, 1018591.321411133 220397.9415893555, 1018824.15222168 220444.0570068359, 1018939.43762207 220459.9885864258, 1019076.838806152 220472.3541870117, 1019203.705200195 220477.8526000977, 1019349.351623535 220477.8526000977, 1019502.75982666 220466.8685913086, 1019638.458435059 220452.1343994141, 1019777.109619141 220425.7139892578, 1019860.718383789 220409.033996582, 1020059.99621582 220341.6423950195, 1020213.319396973 220275.6680297852, 1020360.668212891 220202.0657958984, 1020414.034423828 220172.3121948242, 1020564.690185547 220072.1777954102, 1020785.18182373 219898.716796875, 1021011.713623047 219708.8746337891, 1021348.228393555 219398.3112182617, 1021504.392822266 219234.4495849609, 1021618.96282959 219114.2304077148, 1021689.125793457 219032.9976196289, 1021717.223022461 219000.4672241211, 1021759.950012207 218950.9990234375, 1021908.460205078 218768.0629882813, 1022052.534790039 218590.5928344727, 1022088.194213867 218546.6691894531, 1022139.306213379 218472.1713867188, 1022238.390808105 218324.9241943359, 1022530.632385254 217851.1929931641, 1022728.275024414 217530.8082275391))

最佳答案

如果我正确理解你的问题,你有两个包含点的 .csv 文件和一个包含多边形的 .shp 文件,并且你想计算每个文件中的点数多边形。如果是这样,您确实需要空间连接。它将检查每个点和多边形之间的几何关系(例如,内部),然后返回每个点所在多边形的 ID。连接后,您的点dataframe 将有一个新列,其中每行都是一个多边形 ID。因此,在执行此操作之前,请确保所有 dataframegeodataframe 都有一个具有不同名称的 ID 变量。示例代码可以写成如下:

首先,您需要将 pandas dataframe 转换为 geopandas geodataframe

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

# loading polygons geodataframe
gdf_polygons = gpd.read_file('shape_file.shp')
# loading points dataframe
df = pd.read_csv('points_file.csv')

# converting longitude & latitude to geometry
df['coordinates'] = list(zip(df.longitude, df.latitude))
df.coordinates = df.coordinates.apply(Point)

# converting dataframe to geodataframe
gdf_points = gpd.GeoDataFrame(df, geometry='coordinates')
gdf_points.crs = gdf_polygons.crs

# spatial join
sjoin = gpd.sjoin(gdf_points, gdf_polygons, how='left')

# converting geodataframe to dataframe
df_sjoin = pd.DataFrame(sjoin)
# checking missing values
df_sjoin[df_sjoin.geoid.isnull()].shape

请注意,zip(df.longitude, df.latitude) 的顺序不能颠倒。在连接之前,两个geodataframe 中的 CRS 必须相同。最好在连接后检查缺失值,其数字表示有多少点不属于任何多边形。 (上面的代码假设您的原始多边形geodataframe中有一个geoid列。

现在我可以想到两个选项来计算每个多边形中的点数,并且它们都使用 .groupby() 方法。

选项 1:创建一个新列并将每行分配给 1,然后将 .groupby() 分配给多边形 ID(即, 大地水准面),同时对新列求和。

df_sjoin['obs'] = 1
counts = df_sjoin.groupby('geoid')['obs'].sum()
df = pd.DataFrame(counts).reset_index()

选项 2:.groupby() 多边形 ID (大地水准面),同时计算原始点中唯一值的数量dataframe ID 列(假设其名为 id)。

counts = df_sjoin.groupby('geoid')['id'].count()
df = pd.DataFrame(counts).reset_index()

您还应该检查以下代码是否返回True。 (这假设您在连接后删除了 geoid 列中的那些缺失值。)

len(df_sjoin) == df.obs.sum() # if you use option 1

如果您有使用其他 GIS 软件(例如 QGIS、ArcGIS 等)进行空间连接的经验,您会对 geopandas 的速度感到惊讶。

关于python - 基于集中度结合 GeoPandas Dataframe 和 Pandas Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53491520/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com