gpt4 book ai didi

python - 稀疏数据帧上的 Pandas.concat ……一个谜?

转载 作者:太空狗 更新时间:2023-10-30 01:33:15 27 4
gpt4 key购买 nike

为什么当连接 2 个数据帧时,结果是稀疏的...但是以一种奇怪的方式?如何评估连接的 Dataframe 占用的内存?

我给你们写了一个代码示例来更好地理解这个问题:

import pandas as pd

df1 = pd.DataFrame({'A': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'B': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'C': [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
'D': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'E': [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0]},
index=['a','b','c','d','e','f','g','h','i','j','k','l']).to_sparse(fill_value=0)

df2 = pd.DataFrame({'F': [0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0],
'G': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0],
'H': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'I': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'J': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6]},
index=['a','b','c','d','e','f','g','h','i','j','k','l']).to_sparse(fill_value=0)

print("df1 sparse size =", df1.memory_usage().sum(),"Bytes, density =", df1.density)
print(type(df1))
print('default_fill_value =', df1.default_fill_value)
print(df1.values)

print("df2 sparse size =", df2.memory_usage().sum(),"Bytes, density =", df2.density)
print(type(df2))
print('default_fill_value =', df2.default_fill_value)
print(df2.values)

result = pd.concat([df1,df2], axis=1)

print(type(result)) # Seems alright
print('default_fill_value =', result.default_fill_value) # The default fill value is not 0 ???
print(result.values) # What's that "nan" blocks ?
# result.density # Throw an error
# result.memory_usage # Throw an error

更一般地说:有人知道这里发生了什么吗?

最佳答案

这是一个已知问题,并且有一个 issue

关于python - 稀疏数据帧上的 Pandas.concat ……一个谜?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35083277/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com