python - 为什么 pandas df.diff(2) 与 df.diff().diff() 不同？-6ren

python - 为什么 pandas df.diff(2) 与 df.diff().diff() 不同？

转载作者：太空宇宙更新时间：2023-11-04 06:54:11

29

4

根据 Ender 的 Applied Econometric Time Series ，变量 y 的二阶差分定义为:

Pandas 提供了 diff 函数，它接收“periods”作为参数。尽管如此，df.diff(2) 给出的结果与 df.diff().diff() 不同。

显示上述内容的代码摘录:

In [8]: df
Out[8]:
       C.1   C.2    C.3     C.4     C.5   C.6
C.0
1990  16.0   6.0  256.0   216.0   65536  4352
1991  17.0   7.0  289.0   343.0  131072  5202
1992   6.0  -4.0   36.0   -64.0      64   252
1993   7.0  -3.0   49.0   -27.0     128   392
1994   8.0  -2.0   64.0    -8.0     256   576
1995  13.0   3.0  169.0    27.0    8192  2366
1996  10.0   0.5  100.0     0.5    1024  1100
1997  11.0   1.0  121.0     1.0    2048  1452
1998   4.0  -6.0   16.0  -216.0      16    80
1999   5.0  -5.0   25.0  -125.0      32   150
2000  18.0   8.0  324.0   512.0  262144  6156
2001   3.0  -7.0    9.0  -343.0       8    36
2002   0.5 -10.0    0.5 -1000.0      48    20
2003   1.0  -9.0    1.0  -729.0       2     2
2004  14.0   4.0  196.0    64.0   16384  2940
2005  15.0   5.0  225.0   125.0   32768  3600
2006  12.0   2.0  144.0     8.0    4096  1872
2007   9.0  -1.0   81.0    -1.0     512   810
2008   2.0  -8.0    4.0  -512.0       4    12
2009  19.0   9.0  361.0   729.0  524288  7220

In [9]: df.diff(2)
Out[9]:
       C.1   C.2    C.3     C.4       C.5     C.6
C.0
1990   NaN   NaN    NaN     NaN       NaN     NaN
1991   NaN   NaN    NaN     NaN       NaN     NaN
1992 -10.0 -10.0 -220.0  -280.0  -65472.0 -4100.0
1993 -10.0 -10.0 -240.0  -370.0 -130944.0 -4810.0
1994   2.0   2.0   28.0    56.0     192.0   324.0
1995   6.0   6.0  120.0    54.0    8064.0  1974.0
1996   2.0   2.5   36.0     8.5     768.0   524.0
1997  -2.0  -2.0  -48.0   -26.0   -6144.0  -914.0
1998  -6.0  -6.5  -84.0  -216.5   -1008.0 -1020.0
1999  -6.0  -6.0  -96.0  -126.0   -2016.0 -1302.0
2000  14.0  14.0  308.0   728.0  262128.0  6076.0
2001  -2.0  -2.0  -16.0  -218.0     -24.0  -114.0
2002 -17.5 -18.0 -323.5 -1512.0 -262096.0 -6136.0
2003  -2.0  -2.0   -8.0  -386.0      -6.0   -34.0
2004  13.5  14.0  195.5  1064.0   16336.0  2920.0
2005  14.0  14.0  224.0   854.0   32766.0  3598.0
2006  -2.0  -2.0  -52.0   -56.0  -12288.0 -1068.0
2007  -6.0  -6.0 -144.0  -126.0  -32256.0 -2790.0
2008 -10.0 -10.0 -140.0  -520.0   -4092.0 -1860.0
2009  10.0  10.0  280.0   730.0  523776.0  6410.0

In [10]: df.diff().diff()
Out[10]:
       C.1   C.2    C.3     C.4       C.5      C.6
C.0
1990   NaN   NaN    NaN     NaN       NaN      NaN
1991   NaN   NaN    NaN     NaN       NaN      NaN
1992 -12.0 -12.0 -286.0  -534.0 -196544.0  -5800.0
1993  12.0  12.0  266.0   444.0  131072.0   5090.0
1994   0.0   0.0    2.0   -18.0      64.0     44.0
1995   4.0   4.0   90.0    16.0    7808.0   1606.0
1996  -8.0  -7.5 -174.0   -61.5  -15104.0  -3056.0
1997   4.0   3.0   90.0    27.0    8192.0   1618.0
1998  -8.0  -7.5 -126.0  -217.5   -3056.0  -1724.0
1999   8.0   8.0  114.0   308.0    2048.0   1442.0
2000  12.0  12.0  290.0   546.0  262096.0   5936.0
2001 -28.0 -28.0 -614.0 -1492.0 -524248.0 -12126.0
2002  12.5  12.0  306.5   198.0  262176.0   6104.0
2003   3.0   4.0    9.0   928.0     -86.0     -2.0
2004  12.5  12.0  194.5   522.0   16428.0   2956.0
2005 -12.0 -12.0 -166.0  -732.0       2.0  -2278.0
2006  -4.0  -4.0 -110.0  -178.0  -45056.0  -2388.0
2007   0.0   0.0   18.0   108.0   25088.0    666.0
2008  -4.0  -4.0  -14.0  -502.0    3076.0    264.0
2009  24.0  24.0  434.0  1752.0  524792.0   8006.0

In [11]: df.diff(2) - df.diff().diff()
Out[11]:
       C.1   C.2    C.3     C.4       C.5      C.6
C.0
1990   NaN   NaN    NaN     NaN       NaN      NaN
1991   NaN   NaN    NaN     NaN       NaN      NaN
1992   2.0   2.0   66.0   254.0  131072.0   1700.0
1993 -22.0 -22.0 -506.0  -814.0 -262016.0  -9900.0
1994   2.0   2.0   26.0    74.0     128.0    280.0
1995   2.0   2.0   30.0    38.0     256.0    368.0
1996  10.0  10.0  210.0    70.0   15872.0   3580.0
1997  -6.0  -5.0 -138.0   -53.0  -14336.0  -2532.0
1998   2.0   1.0   42.0     1.0    2048.0    704.0
1999 -14.0 -14.0 -210.0  -434.0   -4064.0  -2744.0
2000   2.0   2.0   18.0   182.0      32.0    140.0
2001  26.0  26.0  598.0  1274.0  524224.0  12012.0
2002 -30.0 -30.0 -630.0 -1710.0 -524272.0 -12240.0
2003  -5.0  -6.0  -17.0 -1314.0      80.0    -32.0
2004   1.0   2.0    1.0   542.0     -92.0    -36.0
2005  26.0  26.0  390.0  1586.0   32764.0   5876.0
2006   2.0   2.0   58.0   122.0   32768.0   1320.0
2007  -6.0  -6.0 -162.0  -234.0  -57344.0  -3456.0
2008  -6.0  -6.0 -126.0   -18.0   -7168.0  -2124.0
2009 -14.0 -14.0 -154.0 -1022.0   -1016.0  -1596.0

为什么不同？哪一个对应于安德书中定义的那个？

最佳答案

正是因为

Δ² y_t = y_t - 2 y_{t - 1} + y< sub>t - 2 ≠ y_t - y_{t - 2}。

左侧是 df.diff().diff()，而右侧是 df.diff(2)。对于差异中的差异，您想要左侧。

关于python - 为什么 pandas df.diff(2) 与 df.diff().diff() 不同？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50162212/

29

4

0

文章推荐： python - 如何优化三个连续的 str.replace() 调用？

文章推荐： css - django css 文件未加载

文章推荐： c - c 编程中的字符串在输出末尾具有垃圾值

文章推荐： python - 正则表达式仅用于字符串中的数字？

python - Python 中的集群或合并集群以减少组数 (Python)
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库，但没有成功。我猜它只是通过 knn 聚类
python - python 列表的子集基于同一列表的元素组，pythonically
我有一个扁平数字列表，这些数字逻辑上以 3 为一组，其中每个三元组是 (number, __ignored, flag[0 or 1])，例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
python - 激活 Python 虚拟环境并在另一个 Python 脚本中调用 Python 脚本
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
python - 在焕然一新的 Python 环境中以编程方式从 Python 内部执行 Python 文件
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
python - 从 python 脚本但在 python 脚本之外运行 python 脚本
这听起来像是谜语或笑话，但实际上我还没有找到这个问题的答案。问题到底是什么？我想运行 2 个脚本。在第一个脚本中，我调用另一个脚本，但我希望它们继续并行，而不是在两个单独的线程中。主要是我不希望第
python - 使用不同的 python 从 python 运行 python 脚本
我有一个带有 python 2.5.5 的软件。我想发送一个命令，该命令将在 python 2.7.5 中启动一个脚本，然后继续执行该脚本。我试过用 #!python2.7.5 和http://re
python - 为什么从 Python 命令行调用 Python 时 Python 无法找到并运行我的脚本？
我在 python 命令行(使用 python 2.7)中，并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹，使用: os.chdir("
python - 使用动态版本的 Python 执行嵌入的 Python 代码时出现致命的 Python 错误
剧透:部分解决(见最后)。以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
python - python 中识别 python 数组或列表中最大累积差异的最快方法是什么？
假设我有以下列表，对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
python - (Python) 通过单选按钮 python 更新背景
所以我试图在选择某个单选按钮时更改此框架的背景。我的框架位于一个类中，并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
python - python 中的字符串与正则表达式比较在 python 中失败
我正在尝试将字符串与 python 中的正则表达式进行比较，如下所示， #!/usr/bin/env python3 import re str1 = "Expecting property name
python - python 如何加载Boost.Python 库？
考虑以下原型(prototype) Boost.Python 模块，该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
python - python 检查模块 python 的问题
如何编写一个程序来“识别函数调用的行号？” python 检查模块提供了定位行号的选项，但是， def di(): return inspect.currentframe().f_back.f_l
python - 系统 python 与用户 python
我已经使用 macports 安装了 Python 2.7，并且由于我的 $PATH 变量，这就是我输入 $ python 时得到的变量。然而，virtualenv 默认使用 Python 2.6，除
python - [Python] : Python re. 长字符串行的搜索速度优化
我只想问如何加快 python 上的 re.search 速度。我有一个很长的字符串行，长度为 176861(即带有一些符号的字母数字字符)，我使用此函数测试了该行以进行研究: def getExe
python - 编辑字符串 python 正则表达式 python
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
python - Python 映射中的副作用(Python "do" block )
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。告
python - 使用其值逻辑组合两个 python 列表 - Python
我想用 Python 将两个列表组合成一个列表，方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
python - Boost.Python python 链接错误
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
python - 在 Python 中仅使用内置库制作一个基本的网络抓取工具 - Python
学习 Python，我正在尝试制作一个没有任何第 3 方库的网络抓取工具，这样过程对我来说并没有简化，而且我知道我在做什么。我浏览了一些在线资源，但所有这些都让我对某些事情感到困惑。 html 看起来

首页

博学

6Ren·AI

商城

python - 为什么 pandas df.diff(2) 与 df.diff().diff() 不同？