gpt4 book ai didi

python - 创建由 Pandas Dataframe 中的另一列分区的列的所有可能排列

转载 作者:行者123 更新时间:2023-11-28 18:25:10 32 4
gpt4 key购买 nike

我有这样的数据框:

Current State

我的目标是:

Final State

解释:

  1. 每位顾客下了3个订单
  2. 一个人可以在每个订单中购买尽可能多的类别
  3. 期望状态:获取客户按订单顺序购买的类别的所有可能排列。第二张图片有助于更好地理解这一点
  4. 所需状态的 Category1 表示第一个订单购买的类别,Category2 表示第二个订单购买的类别,依此类推。

我正在使用的代码:

start_time = time.time()

df = pd.DataFrame()
for CustomerName in base_df.CustomerName.unique():
df1 = base_df[(base_df['CustomerName']== CustomerName)][['CustomerName','order_seq','Category']]
df2 = pd.DataFrame(index=pd.MultiIndex.from_product([subdf['Category'] for p, subdf in df1.groupby(['order_seq'])], names = df1.order_seq.unique())).reset_index()
df2['CustomerName'] = CustomerName
df = df.append(df2)

print("--- %s seconds ---" %(time.time() - start_time))

这需要大约 10 分钟才能在我的数据集上运行 - 寻找更快的方法。

我现在正在研究 Pandas,但也欢迎提供 R 或 SQL 的指针!谢谢!

最佳答案

考虑合并三个 OrderSequence 数据帧,每个都连接到一个不同的 CustomerName:

import pandas as pd

df = pd.DataFrame({'CustomerName': [1,1,1,1,1,1,1,2,2,2,3,3,3,3],
'OrderSequence': [1,2,2,2,3,3,3,1,2,3,1,1,2,3],
'Category': ['Food','Food','Clothes','Furniture','Clothes','Food','Toys',
'Clothes','Toys','Food','Furniture','Toys','Food','Food']})

finaldf = pd.DataFrame(df['CustomerName'].drop_duplicates())

for i in range(1,4):
seqdf = df[df['OrderSequence']==i][['CustomerName', 'Category']].\
rename(columns={'Category':'Category'+str(i)})
finaldf = pd.merge(finaldf, seqdf, on=['CustomerName'])

print(finaldf)

# CustomerName Category1 Category2 Category3
# 0 1 Food Food Clothes
# 1 1 Food Food Food
# 2 1 Food Food Toys
# 3 1 Food Clothes Clothes
# 4 1 Food Clothes Food
# 5 1 Food Clothes Toys
# 6 1 Food Furniture Clothes
# 7 1 Food Furniture Food
# 8 1 Food Furniture Toys
# 9 2 Clothes Toys Food
# 10 3 Furniture Food Food
# 11 3 Toys Food Food

无可否认,上述设置首先是在使用自连接的 SQL 中想到的,然后转换为 pandas:

SELECT t1.CustomerName, t2.Category AS Category1, 
t3.Category AS Category2, t4.Category AS Category3

FROM (SELECT DISTINCT CustomerName FROM DataFrame) AS t1
INNER JOIN DataFrame AS t2
ON t1.CustomerName = t2.CustomerName
INNER JOIN DataFrame AS t3
ON t1.CustomerName = t3.CustomerName
INNER JOIN DataFrame AS t4
ON t1.CustomerName = t4.CustomerName

WHERE (t2.OrderSequence=1) AND (t3.OrderSequence=2) AND (t4.OrderSequence=3);

关于python - 创建由 Pandas Dataframe 中的另一列分区的列的所有可能排列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41885627/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com