gpt4 book ai didi

python - 在字典中获取决策树

转载 作者:行者123 更新时间:2023-12-03 15:35:00 24 4
gpt4 key购买 nike

我正在寻找一种在python中动态制作基于所需结构的字典的方法。

我有以下数据:

{'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'],  'lateness': ['ontime', 'delayed']} 

我给出了我希望它们像的结构:

['weather', 'season', 'lateness']

并最终以这种格式获取数据:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}

这是我为实现这一目标而想到的手动方式:

dtree = {}
for cat1 in category_cases['weather']:
dtree.setdefault(cat1, {})
for cat2 in category_cases['season']:
dtree[cat1].setdefault(cat2, {})
for cat3 in category_cases['lateness']:
dtree[cat1][cat2].setdefault(cat3, 0)

你能想出一种方法来改变我写的结构并得到想要的结果吗?
请记住,结构可能不会每次都具有相同的大小。

此外,如果您想到除字典之外的另一种方式可以访问结果,它也适用于我。

最佳答案

如果您不想使用外部软件包, pandas.DataFrame 可能是一个可行的候选人,因为看起来您将使用表格:

import pandas as pd
df = pd.DataFrame(
index=pd.MultiIndex.from_product([d['weather'], d['season']]),
columns=d['lateness'], data=0
)

结果:
              ontime  delayed
windy summer 0 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 0

您可以使用 indexing 轻松进行更改:
df.loc[('windy', 'summer'), 'ontime'] = 1
df.loc['calm', 'autumn']['delayed'] = 2

# Result:
ontime delayed
windy summer 1 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 2

如果您始终将最后一个键用于列,则可以动态构造该表,假设您的键按所需的插入顺序排列:
df = pd.DataFrame(
index=pd.MultiIndex.from_product(list(d.values())[:-1]),
columns=list(d.values())[-1], data=0
)

既然您对 pandas感兴趣,鉴于您的结构,我还建议您仔细阅读 MultiIndex and Advance Indexing ,只是想了解如何处理您的数据。这里有些例子:
# Gets the sum of 'delayed' items in all of 'calm'
# Filters all the 'delayed' data in 'calm'
df.loc['calm', 'delayed']

# summer 5
# winter 0
# spring 0
# autumn 2
# Name: delayed, dtype: int64

# Apply a sum:
df.loc['calm', 'delayed'].sum()

# 7

# Gets the mean of all 'summer' (notice the `slice(None)` is required to return all of the 'calm' and 'windy' group)
df.loc[(slice(None), 'summer'), :].mean()

# ontime 0.5
# delayed 2.5
# dtype: float64

它绝对是非常方便和通用的,但在你深入研究它之前,你肯定想先阅读一下,这个框架可能需要一些时间来适应。

否则,如果您仍然喜欢 dict ,这没有什么错。这是一个基于给定键生成的递归函数(假设您的键处于所需的插入顺序):
def gen_dict(d, level=0):
if level >= len(d):
return 0
key = tuple(d.keys())[level]
return {val: gen_dict(d, level+1) for val in d.get(key)}

gen_dict(d)

结果:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}

关于python - 在字典中获取决策树,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61831953/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com