gpt4 book ai didi

python - pandas groupby apply 如何提速

转载 作者:太空宇宙 更新时间:2023-11-04 03:44:31 25 4
gpt4 key购买 nike

有一个数据框,我想将其操作为以下格式。

projectid    vendor_name    project_resource_type    item_quantity
12345 amazon tech 5
12345 best buy supplies 2
abcde amazon tech 1

总而言之,我想将数据框操作成这样

projectid    amazon best_buy tech supplies total_quantity
12345 1. 1. 1. 1. 7
abcde 1. 0. 0. 1. 1.

所以我做了以下

has_vendors = pd.get_dummies(resources.vendor_name, prefix='has_vendor')
resources.drop('vendor_name', 1, inplace=True)
resources = pd.merge(resources, has_vendors, left_index=True, right_index=True, how='outer')

print 'merging resource type dummies'
resources_types = pd.get_dummies(resources.project_resource_type, prefix='has_resource_type')
resources.drop('project_resource_type', 1, inplace=True)
resources = pd.merge(resources, resources_types, left_index=True, right_index=True, how='outer')

gb = resources.groupby('projectid')

columns = [x for x in resources.columns.values if 'has_vendor' in x or 'has_resource_type' in x]
all_cols = [x for x in resources.columns.values if 'has_vendor' in x or 'has_resource_type' in x]
all_cols.append('total_quantity')

def group(x):
vals = []
for i,col in enumerate(columns):
v = np.any(x[col]) + 0.
vals.append(v)

su = np.sum(x['item_quantity'])
vals.append(su)

return pd.Series(vals, index=all_cols)

resources_agg = gb.apply(group)

问题是,我发现 gb.apply(group) 函数太慢了,大约有 650000 个唯一的项目 ID。还有其他方法可以加快速度吗?

最佳答案

你可以试试 pivot_table 会不会更快:

>>> aggrfn = lambda ts: 1 if 0 < ts.sum() else 0
>>> df.pivot_table('item_quantity', 'projectid', 'vendor_name', aggrfn, 0)
vendor_name amazon best buy
projectid
12345 1 1
abcde 1 0

>>> df.pivot_table('item_quantity', 'projectid', 'project_resource_type', aggrfn, 0)
project_resource_type supplies tech
projectid
12345 1 1
abcde 0 1

>>> df.groupby('projectid')['item_quantity'].aggregate({'total_quantity':'sum'})
total_quantity
projectid
12345 7
abcde 1

如果 objs 是包含上述结果的列表,您可以将它们加入:

>>> pd.concat(objs, axis=1)
amazon best buy supplies tech total_quantity
projectid
12345 1 1 1 1 7
abcde 1 0 0 1 1

关于python - pandas groupby apply 如何提速,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24552058/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com