gpt4 book ai didi

python - 重建 Pandas 对象与 copy()

转载 作者:太空宇宙 更新时间:2023-11-03 10:58:12 24 4
gpt4 key购买 nike

有谁知道为什么 pandas 对象 copy() 方法看起来比重建对象慢得多?是否有任何理由在标准构造函数上使用 copy() 方法?

这是一个快速的结果:

In [42]: import pandas as pd

In [43]: df = pd.DataFrame(np.random.rand(300000).reshape(100000,3), columns=list('ABC'))

In [44]: %timeit pd.DataFrame(df)
The slowest run took 5.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.95 µs per loop

In [45]: %timeit df.copy()
The slowest run took 5.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 390 µs per loop

复制操作之间的差异也适用于 pandas 系列。有趣的是,numpy 数组不会表现出相同类型的行为,例如:

In [48]: import numpy as np

In [49]: myarray = np.random.rand(300000)

In [50]: %timeit myarray.copy()
10000 loops, best of 3: 162 µs per loop

In [52]: %timeit np.array(myarray)
10000 loops, best of 3: 168 µs per loop

最佳答案

这是因为副本实际上创建了 DataFrame 的新内部表示,而使用构造函数只是指向同一个:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

In [12]: id(df._data) # internal attribute, don't futz with it!
Out[12]: 4472136472

In [13]: df1 = df.copy()

In [14]: id(df1._data) # different object
Out[14]: 4472572448

In [15]: df2 = pd.DataFrame(df)

In [16]: id(df2._data) # same as df._data
Out[16]: 4472136472

一个推论是,如果您改变原始 DataFrame,它会改变 df2 但不会改变 df1(副本):

In [21]: df.iloc[0, 0] = 99

In [22]: df
Out[22]:
A B
0 99 2
1 3 4

In [23]: df1
Out[23]:
A B
0 1 2
1 3 4

In [24]: df2
Out[24]:
A B
0 99 2
1 3 4

这就是您要使用文案的原因!


在 numpy 中同时复制和构造函数复制:

In [31]: a = np.array([1, 2])

In [32]: a1 = a.copy()

In [33]: a2 = np.array(a)

In [34]: a[0] = 99

In [35]: a1
Out[35]: array([1, 2])

In [36]: a2
Out[36]: array([1, 2])

关于python - 重建 Pandas 对象与 copy(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37825058/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com