gpt4 book ai didi

python - 按分组然后应用函数然后在 Pandas Python 中展平回数据框

转载 作者:行者123 更新时间:2023-11-28 17:34:28 24 4
gpt4 key购买 nike

我有一些数据,其中一些列值是当年的总和(年初至今总和)。我想更改这些列以反射(reflect)差异而不是累计总和。数据如下所示:

                                ReportNumber   NIY  OANCFY  FQTR  FYEARQ
Reference Published
2007-12-31 2008-02-21 00:00:00 1 3131 3073 4 2007
2008-03-31 2008-05-08 00:00:00 1 1189 1482 1 2008
2009-05-07 16:00:00 2 1149 NaN 1 2008
2008-06-30 2008-08-07 00:00:00 1 2296 2493 2 2008
2009-08-18 00:00:00 3 2214 NaN 2 2008
2008-09-30 2008-11-06 00:00:00 1 3402 3763 3 2008
2009-11-07 00:00:00 3 3277 NaN 3 2008
2008-12-31 2009-02-17 16:00:00 1 NaN 4959 4 2008
2009-02-18 00:00:00 3 4202 NaN 4 2008
2010-03-21 00:00:00 5 4031 NaN 4 2008
2009-03-31 2009-05-07 16:00:00 1 942 1441 1 2009
2009-06-30 2009-08-06 00:00:00 1 1748 3017 2 2009
2009-09-30 2009-11-07 00:00:00 1 2458 4423 3 2009
2009-12-31 2010-02-24 16:00:00 1 3181 5598 4 2009
2010-03-31 2010-05-07 16:00:00 1 677 1172 1 2010
2010-06-30 2010-08-06 08:00:00 1 1392 2441 2 2010
2010-09-30 2010-11-08 16:00:00 1 1760 3150 3 2010
2010-12-31 2011-02-24 08:00:00 1 961 3946 4 2010
2011-03-31 2011-05-06 16:00:00 1 310 390 1 2011
2012-08-16 16:00:00 2 319 NaN 1 2011
2011-06-30 2011-08-09 08:00:00 1 465 730 2 2011
2012-08-16 16:00:00 2 443 NaN 2 2011
2011-09-30 2011-11-09 00:00:00 1 394 1222 3 2011
2012-11-06 16:00:00 2 411 NaN 3 2011
2011-12-31 2012-03-06 00:00:00 1 -5725 1785 4 2011
2013-03-05 00:00:00 2 -5754 NaN 4 2011
2012-03-31 2012-05-05 16:00:00 1 42 540 1 2012
2012-08-16 16:00:00 2 10 NaN 1 2012
2012-06-30 2012-08-02 16:00:00 1 -294 999 2 2012
2012-09-30 2012-11-06 16:00:00 1 -675 1785 3 2012
2012-12-31 2013-03-05 00:00:00 1 -219 2708 4 2012

因此,我需要根据“FQTR”和“已发布”索引,在给定的 FYEARQ 中获取 FQTR 之间的差异,并将其作为一个框架。在我的尝试中,只要给定引用数据的项目(“NIY”、“OANCFY”)没有多个值,它就可以工作。

cfgtmp = cftmp.groupby('FYEARQ')
ft = dict()
for group_name, subdf in cftmp.dropna().drop_duplicates().groupby('FYEARQ'):
tmp = pd.concat([subdf.head(1), subdf.diff()]).dropna()
tmp['FQTR'] = subdf['FQTR']
tmp['FYEARQ'] = subdf['FYEARQ']
tmp['ReportNumber'] = subdf['ReportNumber']
ft.update({group_name : tmp})
print group_name
print 'differences'
print tmp
print ' '

pd.concat 尝试处理季度之间的差异(“FQTR”)。它返回:

2007
differences
ReportNumber NIY OANCFY FQTR FYEARQ
Reference Published
2007-12-31 2008-02-21 1 3131 3073 4 2007

2008
differences
ReportNumber NIY OANCFY FQTR FYEARQ
Reference Published
2008-03-31 2008-05-08 1 1189 1482 1 2008
2008-06-30 2008-08-07 1 1107 1011 2 2008
2008-09-30 2008-11-06 1 1106 1270 3 2008

2009
differences
ReportNumber NIY OANCFY FQTR FYEARQ
Reference Published
2009-03-31 2009-05-07 16:00:00 1 942 1441 1 2009
2009-06-30 2009-08-06 00:00:00 1 806 1576 2 2009
2009-09-30 2009-11-07 00:00:00 1 710 1406 3 2009
2009-12-31 2010-02-24 16:00:00 1 723 1175 4 2009

2010
differences
ReportNumber NIY OANCFY FQTR FYEARQ
Reference Published
2010-03-31 2010-05-07 16:00:00 1 677 1172 1 2010
2010-06-30 2010-08-06 08:00:00 1 715 1269 2 2010
2010-09-30 2010-11-08 16:00:00 1 368 709 3 2010
2010-12-31 2011-02-24 08:00:00 1 -799 796 4 2010

2011
differences
ReportNumber NIY OANCFY FQTR FYEARQ
Reference Published
2011-03-31 2011-05-06 16:00:00 1 310 390 1 2011
2011-06-30 2011-08-09 08:00:00 1 155 340 2 2011
2011-09-30 2011-11-09 00:00:00 1 -71 492 3 2011
2011-12-31 2012-03-06 00:00:00 1 -6119 563 4 2011

2012
differences
ReportNumber NIY OANCFY FQTR FYEARQ
Reference Published
2012-03-31 2012-05-05 16:00:00 1 42 540 1 2012
2012-06-30 2012-08-02 16:00:00 1 -336 459 2 2012
2012-09-30 2012-11-06 16:00:00 1 -381 786 3 2012
2012-12-31 2013-03-05 00:00:00 1 456 923 4 2012

这个解决方案的问题是它只对'ReportNumber' == 1 有效

然后我使用 pd.concat 将其展平为一帧:

pd.concat([ft[f] for f in ft])

有什么建议吗?

最佳答案

您的 cftmp.dropna(). 部分丢弃了与 1 以外的 ReportNumber 关联的数据。在您的示例数据框中,这些数据恰好具有 OANCFY 的 nan

但是,为了避免使用循环,您可以这样做:使用 head 获取第一个观察值,使用 diff() 获取 delta,然后使用 concat 他们在一起。

In [71]:

newdf = pd.concat((df.groupby(['FYEARQ',
'ReportNumber']).head(1),
df.groupby(['FYEARQ',
'ReportNumber']).diff().dropna())).reset_index()\
.sort('Reference')\
.dropna(subset=['OANCFY'])\
.reset_index(drop=True)\
.fillna(method='pad')

newdf['FQTR'] = newdf.FQTR.groupby(newdf.FYEARQ).cumsum()

print newdf

Reference Published FQTR FYEARQ NIY OANCFY ReportNumber
0 2007-12-31 2008-02-21 00:00:00 4 2007 3131 3073 1
1 2008-03-31 2008-05-08 00:00:00 1 2008 1189 1482 1
2 2008-06-30 2008-08-07 00:00:00 2 2008 1107 1011 1
3 2008-09-30 2008-11-06 00:00:00 3 2008 1106 1270 1
4 2009-03-31 2009-05-07 16:00:00 1 2009 942 1441 1
5 2009-06-30 2009-08-06 00:00:00 2 2009 806 1576 1
6 2009-09-30 2009-11-07 00:00:00 3 2009 710 1406 1
7 2009-12-31 2010-02-24 16:00:00 4 2009 723 1175 1
8 2010-03-31 2010-05-07 16:00:00 1 2010 677 1172 1
9 2010-06-30 2010-08-06 08:00:00 2 2010 715 1269 1
10 2010-09-30 2010-11-08 16:00:00 3 2010 368 709 1
11 2010-12-31 2011-02-24 08:00:00 4 2010 -799 796 1
12 2011-03-31 2011-05-06 16:00:00 1 2011 310 390 1
13 2011-06-30 2011-08-09 08:00:00 2 2011 155 340 1
14 2011-09-30 2011-11-09 00:00:00 3 2011 -71 492 1
15 2011-12-31 2012-03-06 00:00:00 4 2011 -6119 563 1
16 2012-03-31 2012-05-05 16:00:00 1 2012 42 540 1
17 2012-06-30 2012-08-02 16:00:00 2 2012 -336 459 1
18 2012-09-30 2012-11-06 16:00:00 3 2012 -381 786 1
19 2012-12-31 2013-03-05 00:00:00 4 2012 456 923 1

关于python - 按分组然后应用函数然后在 Pandas Python 中展平回数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31916255/

24 4 0