gpt4 book ai didi

python - 如何优化 pandas 的内存使用

转载 作者:太空宇宙 更新时间:2023-11-03 16:25:22 25 4
gpt4 key购买 nike

我尝试使用 pandas 合并 3 个大约 3GB、200Kb 和 200kb 的文件,并且我的计算机有 32G 内存,但仍然以 MemoryError 结束。有什么办法可以避免这个问题吗?我的合并代码如下:

product = pd.read_csv("../data/process_product.csv", header=0)
product["bandID"] = pd.factorize(product.Band)[0]
product = product.drop('Band', 1)
product = product.drop('Info', 1)

town_state = pd.read_csv("../data/town_state.csv", header=0)
dumies = pd.get_dummies(town_state.State)
town_state = pd.concat([town_state, dumies], axis=1)
town_state["townID"] = pd.factorize(town_state.Town)[0]
town_state = town_state.drop('State', 1)
town_state = town_state.drop('Town', 1)
train = pd.read_csv("../data/train.csv", header=0)

result = pd.merge(train, town_state, on="Agencia_ID", how='left')
result = pd.merge(result, product, on="Producto_ID", how='left')
result.to_csv("../data/train_data.csv")

最佳答案

这是我的“微观”优化尝试:

您不使用(不需要)process_product.csv 中的 Info 列,因此无需读取它:

cols = [<list of columns, EXCEPT Info column>]
product = pd.read_csv("../data/process_product.csv", usecols=cols)
product['Band'] = pd.factorize(product.Band)[0]
product.rename(columns={'Band':'bandID'}, inplace=True)

我们可以尝试在 dumies 变量上节省一些内存 - 即时使用 get_dummies() 并使用 sparse=True 参数:

town_state = pd.concat([town_state, pd.get_dummies(town_state.State, sparse=True)], axis=1)
del town_state['State']
town_state['Town'] = pd.factorize(town_state.Town)[0]
town_state.rename(columns={'Town':'townID'}, inplace=True)

尝试保存结果 DF,尽快从内存中删除town_state:

train = pd.merge(train, town_state, on="Agencia_ID", how='left')
del town_state
train = pd.merge(train, product, on="Producto_ID", how='left')
del product

PS 我不知道哪个文件/DF 最大(32GB),所以我假设它是一个 train DF。如果它是 product DF,那么我会这样做:

product = pd.merge(train, product, on="Producto_ID", how='left')
del train
product = pd.merge(product, town_state, on="Agencia_ID", how='left')
del town_state

关于python - 如何优化 pandas 的内存使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38036084/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com