python - 执行数据帧的 Pandas 连接并从文件中读取它-6ren

python - 执行数据帧的 Pandas 连接并从文件中读取它

转载作者：太空宇宙更新时间：2023-11-04 04:59:33

我有一个用例，我需要创建一个包含年份和月份的 python 字典，然后将所有数据帧连接到单个数据帧。我已经完成了如下实现:

dict_year_month = {}
temp_dict_1={}
temp_dict_2={}  
for ym in [201104,201105 ... 201706]:

    key_name = 'df_'+str(ym)+'A'
        temp_dict_1[key_name]=df[(df['col1']<=ym) & (df['col2']>ym)
                                      & (df['col3']==1)]

        temp_dict_2[key_name]=df[(df['col1']<=ym) & (df['col2']==0)
                                     & (df['col3']==1)]

        if not temp_dict_1[key_name].empty:
            dict_year_month [key_name] =temp_dict_1[key_name]
            dict_year_month [key_name].loc[:, 'new_col'] = ym
        elif not temp_dict_2[key_name].empty:
            dict_year_month [key_name] =temp_dict_2[key_name]
            dict_year_month [key_name].loc[:, 'new_col'] = ym

        dict_year_month [key_name]=dict_year_month [key_name].sort_values('col4')
        dict_year_month [key_name]=dict_year_month [key_name].drop_duplicates('col5') 
   .. do some other processing 
   create individual dataframes as df_201104A .. and so on ..
dict_year_month
#concatenate all the above individual dataframe into single dataframe:
df1 = pd.concat([
           dict_year_month['df_201104A'],dict_year_month['df_201105A'],
           ... so on till dict_year_month['df_201706A'])

现在的挑战是我必须在每个季度重新运行代码集，所以每次我都必须使用新的 yearmonths dict key 和 pd.concat 更新此脚本时也需要更新新年月份的详细信息。我正在寻找其他一些解决方案，通过它我可能可以读取 key 并从属性文件或配置文件中串联创建数据帧列表？

最佳答案

你只需要做几件事就可以到达那里——首先是枚举你的开始和结束月份之间的月份，我在下面使用 rrule 来做，从文件中读取开始和结束日期.这将为您提供字典的键。然后只需在字典上使用 .values() 方法即可获取所有数据帧。

from dateutil import rrule
from datetime import datetime, timedelta
import pickle

#get these from whereever, config, etc.
params = {
    'start_year':2011,
    'start_month':4,
    'end_year':2017,
    'end_month':6,
}

pickle.dump(params, open("params.pkl", "wb"))

params = pickle.load(open("params.pkl", "rb"))

start = datetime(year=params['start_year'], month=params['start_month'], day=1)
end = datetime(year=params['end_year'], month=params['end_month'], day=1)

keys = [int(dt.strftime("%Y%m")) for dt in rrule.rrule(rrule.MONTHLY, dtstart=start, until=end)]
print(keys)    
## Do some things and get a dict
dict_year_month = {'201104':pd.DataFrame([[1, 2, 3]]), '201105':pd.DataFrame([[4, 5, 6]])} #... etc

pd.concat(dict_year_month.values())