gpt4 book ai didi

python - 从 pickle 读取时,数据帧被解析为元组

转载 作者:行者123 更新时间:2023-12-01 07:06:41 24 4
gpt4 key购买 nike

我有一个 pickle 文件,其中包含数据帧的字典。作为数据清理脚本的一部分,我加载此 pickle 并对某些(但不是全部)数据帧进行额外处理,然后覆盖要稍后由模拟程序拾取和加载的 pickle。

当我在处理后读取pickle时,除了两个值之外的所有值都被正确解包并解析为数据帧,但这两个值被读取为元组。由于这两个实际上不需要在此特定数据清理脚本中进行任何更改,因此除以下内容外,脚本不会处理它们:

#start of script, read in the pickle assign the dfs for later use.
input_file = sys.argv[1]
with open(input_file, 'rb') as handle:
data = pickle.load(handle)


trips = data['trips'] # this sees additional processing, is correctly written out as a DF.
stops = data['stops'] # this sees additional processing, is correctly written out as a DF.
stop_times = data['stop_times'], # NO additional processing, is INCORRECTLY written out as a tuple.
road_segs = data['road_segs'], # NO additional processing, is INCORRECTLY written out as a tuple.
seg_props = data['seg_props'] # NO additional processing, is correctly written out as a df.


... # do additional processing on trips and stops


#Output the update DFs and carry the unaltered DFs through to overwrite the original pickle.

data = {
"trips": trips,
"stops": stops,
"stop_times": stop_times,
"road_segs": road_segs,
"seg_props": seg_props
}

with open(input_file, 'wb') as handle:
pickle.dump(data, handle, protocol=4)

如果我在运行此脚本之前阅读pickle,我会得到以下内容。

[type(val) for val in gtfs.values()]                                                                                                                                                    
#output
[pandas.core.frame.DataFrame,
geopandas.geodataframe.GeoDataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame]

及之后:

[type(val) for val in gtfs.values()]                                                                                                                                                    
Out[17]:
[pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame,
tuple,
tuple,
pandas.core.frame.DataFrame]

这些元组也高度嵌套:

(((                                   trip_id stop_id  stop_duation
0 15243854-AUG19-MVS-BUS-Weekday-01 17894 0.0
1 15243854-AUG19-MVS-BUS-Weekday-01 17897 0.0
2 15243854-AUG19-MVS-BUS-Weekday-01 17900 0.0

[2812369 rows x 3 columns],),),)

最佳答案

我有两个悬挂逗号

stop_times = data['stop_times'],
road_segs = data['road_segs'],

在我的导入中,这是造成这种情况的原因。我无法理解,在一遍又一遍地盯着它看之后,我怎么没有注意到这一点。

关于python - 从 pickle 读取时,数据帧被解析为元组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58418221/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com