gpt4 book ai didi

python - 来自数据框的任何深度的嵌套字典

转载 作者:行者123 更新时间:2023-11-28 18:14:18 25 4
gpt4 key购买 nike

我可以从深度为 3(或更小)的 Pandas 数据框中的分类列创建字典的嵌套字典 - 请参阅代码。但是我的解决方案是硬编码的……想象一下,如果我想按 10 个分类列“拆分”。

我正在寻找可以在伪代码中执行类似操作的东西:

d = {'A': ['a1','a1','a2'], 'B': ['b1','b2','b3'], 'C': ['c1','c2','c2'], 'v': [0,5,1]}
df = pd.DataFrame(data=d)

dA = tree(df=d, cols=['A'])
#it gives dictionary of two dataframes
# "tree" should be some standard implementation
#a1
#a2

dB = tree(df=d, cols=['A', 'B'])
#it give dictionary of three dataframes at lowest level
#a1_b1
#a1_b2
#a2_b3
#"tree" should be ready for any number of cols

#acces operations
dA['a1'], dB['a1'], dB['a1]['b1],...

#iteration operation (transpose is just for example)
dA = dA.iter.T #transposes every dataframe
dB = dB.iter.T #transposes every dataframe on lowest level i.e. dB['a1]['b1].T, dB['a1]['b2].T, ...

#some operations will require access to dictionary keys to make sense or to have enough flexibility:
dA.iter.to_csv(str(key)+'csv')
#produces a1.csv, a2.csv
dB.iter.to_csv(str(key)+'csv')
#produces a1_b1.csv, a1_b2.csv, a2_b3.csv

基本上:要从数据帧轻松创建任何深度的嵌套字典,创建在任何深度的“关键级别”操作数据的函数,并迭代整个字典,而无需为每个级别编写代码。

我的代码:

import pandas as pd
from collections import defaultdict

# sample dataframe
d = {'A': ['a1','a1','a2'], 'B': ['b1','b2','b3'], 'C': ['c1','c2','c2'], 'v': [0,5,1]}
df = pd.DataFrame(data=d)

# make dictionary of dataframes based on categorical column, every categroy is a key to dataframe
def dict_dfs_based_on_cat(df, col):
Cat = df[col].unique()
dictDFbasedOnCat = {elem: pd.DataFrame for elem in Cat}
for key in dictDFbasedOnCat.keys():
dictDFbasedOnCat[key] = df[:][df[col]==key]
return dictDFbasedOnCat

#1st level
di_A = dict_dfs_based_on_cat(df, 'A')

#2nd level
di_A_B= {}
for a in di_A:
di_A_B[a] = dict_dfs_based_on_cat(di_A[a], 'B')

#3rd level
di_A_B_C = defaultdict(dict)
for a in di_A:
for b in di_A_B[a]:
di_A_B_C[a][b] = dict_dfs_based_on_cat(di_A_B[a][b],'C')

#operations on 3rd level
def iter_di(msg, func, di):
print(msg)
for a in di:
for b in di[a]:
for c in di[a][b]:
func(a, b, c, di)

def save(a, b, c, di):
di[a][b][c].to_csv(str(a)+'_'+str(b)+'_'+str(c)+'.csv', index=False)

#sample operation
iter_di('saving', save, di_A_B_C)

#a1_b1_c1.csv
#a1_b2_c2.csv
#a2_b3_c2.csv

最佳答案

您发布的代码可能存在一些问题:

  • d = {'A': ['a1','a1','a2'], 'B': ['b1','b2','b3'], 'C': ' c1','c2','c2'], 'v': [0,5,1]} 缺少右括号(明显修复)
  • return dictDFbasedOnCat 可能缩进不正确。

无论如何,在假设代码应该是什么并运行它之后,di_A_B_C 返回

>>> di_A_B_C
defaultdict(<type 'dict'>, {'a1': {'b1': {'c1': A B C v
0 a1 b1 c1 0}, 'b2': {'c2': A B C v
1 a1 b2 c2 5}}, 'a2': {'b3': {'c2': A B C v
2 a2 b3 c2 1}}})

这个结果可以用一个递归函数来匹配:

def update_nested_dict(d, vars, frame):
if len(vars) > 2:
try:
d[vars[0]] = update_nested_dict(d[vars[0]], vars[1:], frame)
except KeyError:
d[vars[0]] = update_nested_dict({}, vars[1:], frame)
else:
try:
d[vars[0]].update({vars[1]: frame})
except KeyError:
d[vars[0]] = {vars[1]: frame}
return d

你可以定义一个函数,然后它接受一个 DataFrame 对象和你想要排序的列的确切顺序,它吐出一个 defaultdict 对象:

def dataframe_dict(df, cols=None):

if cols is None:
cols = df.keys()

di = {}
df_col_inds = dict(zip(df.keys(), range(len(df.keys()))))
df_col_inds = [df_col_inds[c] for c in cols]
for v in df.values:
_ = update_nested_dict(di, v[df_col_inds], pd.DataFrame(dict(zip(df.keys(), v[:,None]))))

return defaultdict(dict, di)

例如,匹配您的di_A_B_C:

>>> dataframe_dict(df, ['A', 'B', 'C'])
defaultdict(<type 'dict'>, {'a1': {'b1': {'c1': A B C v
0 a1 b1 c1 0}, 'b2': {'c2': A B C v
0 a1 b2 c2 5}}, 'a2': {'b3': {'c2': A B C v
0 a2 b3 c2 1}}})

使用所有列:

>>> dataframe_dict(df) # Same as dataframe_dict(df, df.keys()) = dataframe_dict(df, ['A', 'B', 'C', 'v'])
defaultdict(<type 'dict'>, {'a1': {'b1': {'c1': {0L: A B C v
0 a1 b1 c1 0}}, 'b2': {'c2': {5L: A B C v
0 a1 b2 c2 5}}}, 'a2': {'b3': {'c2': {1L: A B C v
0 a2 b3 c2 1}}}})

列的随机顺序:

>>> dataframe_dict(df, ['v', 'C', 'A'])
defaultdict(<type 'dict'>, {0L: {'c1': {'a1': A B C v
0 a1 b1 c1 0}}, 1L: {'c2': {'a2': A B C v
0 a2 b3 c2 1}}, 5L: {'c2': {'a1': A B C v
0 a1 b2 c2 5}}})

关于python - 来自数据框的任何深度的嵌套字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49474126/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com