gpt4 book ai didi

python - 如何找出 `DataFrame.to_numpy` 没有创建副本

转载 作者:行者123 更新时间:2023-12-03 09:27:20 24 4
gpt4 key购买 nike

pandas.DataFrame.to_numpy 方法有一个 copy 参数和以下文档:

copy : bool, default False

Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.



稍微玩一下,似乎对内存中相邻且非混合类型的数据调用 to_numpy 保持 View 。但是 如何检查生成的 numpy 数组是否与创建它的数据帧共享内存,而不更改数据?

内存共享示例:

import pandas as pd
import numpy as np

# some data frame that I expect not to be copied
frame = pd.DataFrame(np.arange(144).reshape(12,12))
array = frame.to_numpy()
array[:] = 0
print(frame)
# Prints:
# 0 1 2 3 4 5 6 7 8 9 10 11
# 0 0 0 0 0 0 0 0 0 0 0 0 0
# 1 0 0 0 0 0 0 0 0 0 0 0 0
# 2 0 0 0 0 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0 0 0 0 0 0
# 8 0 0 0 0 0 0 0 0 0 0 0 0
# 9 0 0 0 0 0 0 0 0 0 0 0 0
# 10 0 0 0 0 0 0 0 0 0 0 0 0
# 11 0 0 0 0 0 0 0 0 0 0 0 0

不共享内存的例子:

import pandas as pd
import numpy as np

# some data frame that I expect to be copied
types = [int, str, float]
frame = pd.DataFrame({
i: [types[i%len(types)](value) for value in col]
for i, col in enumerate(np.arange(144).reshape(12,12).T)
})
array = frame.to_numpy()
array[:] = 0
print(frame)
# Prints:
# 0 1 2 3 4 5 6 7 8 9 10 11
# 0 0 12 24.0 36 48 60.0 72 84 96.0 108 120 132.0
# 1 1 13 25.0 37 49 61.0 73 85 97.0 109 121 133.0
# 2 2 14 26.0 38 50 62.0 74 86 98.0 110 122 134.0
# 3 3 15 27.0 39 51 63.0 75 87 99.0 111 123 135.0
# 4 4 16 28.0 40 52 64.0 76 88 100.0 112 124 136.0
# 5 5 17 29.0 41 53 65.0 77 89 101.0 113 125 137.0
# 6 6 18 30.0 42 54 66.0 78 90 102.0 114 126 138.0
# 7 7 19 31.0 43 55 67.0 79 91 103.0 115 127 139.0
# 8 8 20 32.0 44 56 68.0 80 92 104.0 116 128 140.0
# 9 9 21 33.0 45 57 69.0 81 93 105.0 117 129 141.0
# 10 10 22 34.0 46 58 70.0 82 94 106.0 118 130 142.0
# 11 11 23 35.0 47 59 71.0 83 95 107.0 119 131 143.0

最佳答案

您可以使用 numpy.shares_memory:

# Your first example
print(np.shares_memory(array, frame)) # True, they are sharing memory

# Your second example
print(np.shares_memory(array2, frame2)) # False, they are not sharing memory

还有 numpy.may_share_memory ,它更快但只能用于确保事物不共享内存(因为它只检查边界是否重叠),所以严格来说不回答问题。
读取 this 以了解差异。

请注意将这些 numpy 函数与 pandas 数据结构一起使用: np.shares_memory(frame, frame) 对于第一个示例返回 True 但对于第二个 返回 False ,可能是因为第二个示例中数据帧的 __array__ 方法在第二个场景中创建了副本。

关于python - 如何找出 `DataFrame.to_numpy` 没有创建副本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62304176/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com