gpt4 book ai didi

Python Pandas - 创建一个函数来替换重复的 DataFrame

转载 作者:行者123 更新时间:2023-12-01 00:16:01 24 4
gpt4 key购买 nike

我是 Python 新手,并且已成功构建了以下代码,该代码在四个单独的数据帧中生成所需的结果

import pandas as pd
x2019 = df.Date.between('2015-06-28','2015-07-04') #Transaction Dates we want to analyze
y2019 = df.First_Purchase_Date.between('2014-01-01','2015-07-04') #customer first purchase dates we want to include in the dataset

TABLE_2019_USA_XX = df.loc[x2019 & y2019 & (df['Region'] == 'USA')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum() #with date filters for table
TABLE_2019_USA_XX['TotalCusts'] = TABLE_2019_USA_XX['New Customer'] + TABLE_2019_USA_XX['Existing Customer']

TABLE_2019_CANADA_XX = df.loc[x2019 & y2019 & (df['Region'] == 'Canada')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum() #with date filters for table
TABLE_2019_CANADA_XX['TotalCusts'] = TABLE_2019_CANADA_XX['New Customer'] + TABLE_2019_CANADA_XX['Existing Customer']

x2018 = df.Date.between('2014-07-23','2014-07-28') #Transaction Dates we want to analyze
y2018 = df.First_Purchase_Date.between('2014-01-01','2014-07-30') #customer first purchase dates we want to include in the dataset

TABLE_2018_USA_XX = df.loc[x2018 & y2018 & (df['Region'] == 'USA')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum() #with date filters for table
TABLE_2018_USA_XX['TotalCusts'] = TABLE_2018_USA_XX['New Customer'] + TABLE_2018_USA_XX['Existing Customer']
TABLE_2018_CANADA_XX = df.loc[x2018 & y2018 & (df['Region'] == 'Canada')].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum() #with date filters for table
TABLE_2018_CANADA_XX['TotalCusts'] = TABLE_2018_CANADA_XX['New Customer'] + TABLE_2018_CANADA_XX['Existing Customer']

print(TABLE_2018_USA_XX)
print(TABLE_2019_USA_XX)
print(TABLE_2018_CANADA_XX)
print(TABLE_2019_CANADA_XX)

输出

FPYear  New Customer    Existing Customer   revenue TotalCusts
2014 0 23 134 23
2015 12 32 432 44


FPYear New Customer Existing Customer revenue TotalCusts
2014 432 421 4315 853
2015 3415 452 2341 3867

FPYear New Customer Existing Customer revenue TotalCusts
2014 22 432 4312 454
2015 33 345 3415 378

FPYear New Customer Existing Customer revenue TotalCusts
2014 5 35 4312 40
2015 432 32 6131 464

根据我所阅读的内容和构建此脚本时得到的反馈,我知道我应该能够使用函数构建上述内容,但我无法确切地弄清楚如何做到这一点。有人可以提供建议让我开始吗?我本质上是在尝试减少我的脚本并使其更加高效。

最佳答案

只需定义一个函数并向参数传递用作过滤器的日期和区域:

import pandas as pd
def process(df, start_dt, end_dt, purch_start, purch_end, region):
mask_date = df['Date'].between(start_dt, end_dt)
mask_purch_date = df['First_Purchase_Date'].between(purch_start, purch_end)
mask_region = df['Region'] == region

temp_df = df[mask_date & mask_purch_date & mask_region].groupby(df['FPYear'])[['New Customer', 'Existing Customer', 'revenue']].sum()

temp_df['TotalCusts'] = temp_df['New Customer'] + temp_df['Existing Customer']

return temp_df


TABLE_2019_USA_XX = process(df,'2015-06-28','2015-07-04', '2014-01-01','2015-07-04', 'USA')

TABLE_2019_CANADA_XX = process(df,'2015-06-28','2015-07-04', '2014-01-01','2015-07-04', 'Canada')

TABLE_2018_USA_XX = process(df,'2014-07-23','2014-07-28', '2014-01-01','2014-07-30', 'USA')

TABLE_2018_CANADA_XX = process(df,'2014-07-23','2014-07-28','2014-01-01','2014-07-30', 'Canada')

关于Python Pandas - 创建一个函数来替换重复的 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59330852/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com