gpt4 book ai didi

python - 如何像 Pandas 老板一样切片、排名和争吵

转载 作者:太空宇宙 更新时间:2023-11-04 05:40:16 26 4
gpt4 key购买 nike

假设有一个查找表总结了这个星球上一些人的忙碌生活...

import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime as dt
t=pd.Timestamp

lu = pd.DataFrame({ 'name' : ['Bill','Elon','Larry','Jeff','Marissa'],
'feels' : ['charitable','Alcoa envy','Elon envy','like the number 7','sassy'],
'last ate' : [t('20151209'),t('20151201'),t('20151208'),t('20151208'),t('20151209')],
'boxers' : [True,True,True,False,True]})

假设还知道这些人住在哪里以及他们何时做某些事情...

af = pd.DataFrame({ 'name' : ['Bill','Elon','Larry','Elon','Jeff','Larry','Larry'],
'address' : ['in my computer','moon','internet','mars','cardboard box','autonomous car','every where'],
'sq_ft' : [2,2135,69,84535, 1.32, 54,168],
'forks' : [7,1,2,1,0,np.nan,1]})

rand_dates=[t('20141202'),t('20130804'),t('20120508'),t('20150411'),
t('20141209'),t('20091023'),t('20130921'),t('20110102'),
t('20130728'),t('20141119'),t('20151024'),t('20130824')]

df = pd.DataFrame({ 'name' : ['Elon','Bill','Larry','Elon','Jeff','Larry','Larry','Bill','Larry','Elon','Marissa','Jeff'],
'activity' : ['slept','tripped','spoke','swam','spooked','liked','whistled','up dog','smiled','donated','grant men paternity leave','fondled'],
'date' : rand_dates})

可以根据他们居住的地址对这些人进行排名,如下所示:

af.name.value_counts()

Larry    3
Elon 2
Jeff 1
Bill 1

需求 1: 使用上面的排名,如何创建一个由查找表 lu 中的信息组成的新“排名”数据框?简而言之,如何制作 Exhibit A?

# Exhibit A
boxers feels last ate name addresses
0 True Elon envy 2015-12-08 Larry 3
1 True Alcoa envy 2015-12-01 Elon 2
2 False like the number 7 2015-12-08 Jeff 1
3 True charitable 2015-12-09 Bill 1

需求2:观察后面groupby操作的输出。如何确定最旧日期和最新日期之间的时间差,以便根据这样的时间差对 lu 的成员进行排名?.. 简单地说,如何从 groupby 到 Exhibit D?

df.groupby(['name','date']).size()

name     date      
Bill 2011-01-02 1
2013-08-04 1
Elon 2014-11-19 1
2014-12-02 1
2015-04-11 1
Jeff 2013-08-24 1
2014-12-09 1
Larry 2009-10-23 1
2012-05-08 1
2013-07-28 1
2013-09-21 1
Marissa 2015-10-24 1

#Exhibit B - Calculate time deltas
name time_delta
Bill Timedelta('945 days 00:00:00')
Elon Timedelta('143 days 00:00:00')
Jeff Timedelta('472 days 00:00:00')
Larry Timedelta('1429 days 00:00:00')
Marissa Timedelta('0 days 00:00:00')

#Exhibit C - Rank time deltas (this is easy)
name time_delta
Larry Timedelta('1429 days 00:00:00')
Bill Timedelta('945 days 00:00:00')
Jeff Timedelta('472 days 00:00:00')
Elon Timedelta('143 days 00:00:00')
Marissa Timedelta('0 days 00:00:00')

#Exhibit D - Add to and re-rank the table built in Exhibit A according to time_delta
boxers feels last ate name addresses time_delta
0 True Elon envy 2015-12-08 Larry 3 1429 days 00:00:00
1 True charitable 2015-12-09 Bill 1 945 days 00:00:00
2 False like the number 7 2015-12-08 Jeff 1 472 days 00:00:00
3 True Alcoa envy 2015-12-01 Elon 2 143 days 00:00:00
4 True sassy 2015-12-09 Marissa NaN 0 days 00:00:00

先前研究: This so post on getting max values using groupby and transformthis other so post on finding and selecting most frequent data信息丰富,但不适用于系列(count_values() 的结果)或只是让我失望......我实际上已经得到了第一部分的工作,但代码有错误并且可能效率低下。

简单易用的代码共享看看这个 IPython Notebook这说明了一切。否则,请查看 Python 2.7 code here .

最佳答案

我想你可以使用 join , sort_values . Aggregation在文档中。

#join value count to lu dataframe, renaming ans sorting
Exhibit_A = lu.set_index('name').join(af.name.value_counts()).rename(columns={'name': 'addresses'}).sort_values('addresses', ascending=False)
#drop rows with NaN, reset index
print Exhibit_A.dropna().reset_index()

name boxers feels last ate addresses
0 Larry True Elon envy 2015-12-08 3
1 Elon True Alcoa envy 2015-12-01 2
2 Bill True charitable 2015-12-09 1
3 Jeff False like the number 7 2015-12-08 1
#aggregate to min and max date 
g = df.groupby(['name']).agg({'date' : [np.max, np.min]})

#reset columns multiindex
levels = g.columns.levels
labels = g.columns.labels
g.columns = levels[1][labels[1]]

g['time_delta'] = g['amax'] - g['amin']

#drop columns
g = g.drop(['amax', 'amin'], axis=1)

#join to Exhibit_A, sort, reset index
Exhibit_D = Exhibit_A.join(g).sort_values('time_delta', ascending=False).reset_index()
#reorder columns
Exhibit_D = Exhibit_D[['boxers', 'feels', 'last ate', 'name', 'addresses' , 'time_delta' ]]
print Exhibit_D

boxers feels last ate name addresses time_delta
0 True Elon envy 2015-12-08 Larry 3 1429 days
1 True charitable 2015-12-09 Bill 1 945 days
2 False like the number 7 2015-12-08 Jeff 1 472 days
3 True Alcoa envy 2015-12-01 Elon 2 143 days
4 True sassy 2015-12-09 Marissa NaN 0 days

关于python - 如何像 Pandas 老板一样切片、排名和争吵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34191746/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com