gpt4 book ai didi

Hierarchical Data frame from a flat dataframe(来自平面数据帧的分层数据帧)

转载 作者:bug小助手 更新时间:2023-10-25 13:37:19 25 4
gpt4 key购买 nike



I having a nested json object, I am able to parse and flatten it to a single level dataframe by preserving hierarchy. Now I need to generate hierarchical data frame need some help on that.

我有一个嵌套的json对象,我能够通过保留层次结构将其解析和扁平化为单层数据帧。现在我需要生成分层数据帧,需要一些帮助。


Sample Object:

示例对象:


{"rr":{"bp": {
"0": "0 - 10",
"1": "10 - 20",
"2": "20 - 30"
},
"al": {
"0": 11.8,
"1": 77.2,
"2": 98.4
}}
}

{“rr”:{“BP”:{“0”:“0-10”,“1”:“10-20”,“2”:“20-30”},“al”:{“0”:11.8,“1”:77.2,“2”:98.4}}


flattened Dataframe:
rr.bp.0 rr.bp.1 rr.bp.2 rr.al.0 rr.al.1 rr.al.2
0 0 - 10 10 - 20 20 - 30 11.8 77.2 98.4

expected hierarchy dataframe
Header1 Header2 0 1 2 3 4 5 6 7 8
rr bp 0 - 10 10 - 20 20 - 30
rr al 11.8 77.2 98.4

expected hierarchy dataframe

预期的层次结构数据帧


trying something like this

尝试像这样的事情



``
for key, value in flattened_data.items():
keys = key.split(".")
column_headers = keys[:-1]
index_label = keys[-1]
columns = pd.MultiIndex.from_tuples([tuple(column_headers)], names=column_headers)
temp_df = pd.DataFrame([value],columns=columns)
temp_df.index = [index_label]
df = pd.concat([df,temp_df])
``

but everything is coming as NaN values

但一切都是以NaN价值观为基础的


Update
I want the output to be dynamic . You can assume that my key names are in order and depth is preserved in flattened dataframe.

更新我希望输出是动态的。您可以假定我的密钥名称是有序的,并且深度被保留在扁平的数据帧中。


Thanks @Timeless. Based on your answer I tried this, it is working
df will be a single level flattened df
example: For this json {"rr":{"bp": { "0": "0 - 10", "1": "10 - 20", "2": "20 - 30" }, "al": { "0": 11.8, "1": 77.2, "2": 98.4 }} }
df will be flattened Dataframe:
rr.bp.0 rr.bp.1 rr.bp.2 rr.al.0 rr.al.1 rr.al.2
0 0 - 10 10 - 20 20 - 30 11.8 77.2 98.4

谢谢@Timeless。根据你的回答我试了这个,它工作的df将是一个单层扁平化的df例子:对于这个json {“rr”:{“bp”:{“0”:“0 - 10”,“1”:“10 - 20”,“2”:“20 - 30”},“al”:{“0”:11.8,“1”:77.2,“2”:98.4 } df将变平数据帧:rr.bp.0 rr.bp.1 rr.bp.2 rr.al.0 rr.al.1 rr.al.2 0 0 - 10 10 - 20 20 - 30 11.8 77.2 98.4


If more nested keys are there then more dots and keys are added in the same order

如果存在更多嵌套键,则会按相同顺序添加更多点和键


max_depth = 0
max_depth_list = []
for col in flatten_copy.columns:
max_depth_list.append(len(re.findall('\.', col)))
max_depth = max(max_depth_list)
df = (
pd.DataFrame(df).pipe(lambda x: x.set_axis(x.columns.str.split(".", expand=True), axis=1))
.stack(list(range(max_depth))).droplevel(0).rename_axis([f"Header{i}" for i in range(max_depth)]).reset_index()
)

更多回答
优秀答案推荐

Your expected output is ambiguous or at least doesn't match with the title of your question.

您的预期输出不明确,或者至少与您问题的标题不匹配。


I suppose that you're expecting a DataFrame like this one :

我想您正在期待这样的DataFrame:


sample_obj = {
"rr": {
"bp": {"0": "0 - 10", "1": "10 - 20", "2": "20 - 30"},
"al": { "0": 11.8, "1": 77.2, "2": 98.4}
}
}

df = (
pd.DataFrame(sample_obj).stack()
.apply(pd.Series) # with a FutureWarning in 2.1.0
.swaplevel().rename_axis(["Header1", "Header2"]).reset_index()
)

Output :

输出:


print(df)

Header1 Header2 0 1 2
0 rr al 11.8 77.2 98.4
1 rr bp 0 - 10 10 - 20 20 - 30

UPDATE :

更新:


If you start from flattened_data, you can use this :

如果从FLATEED_DATA开始,则可以使用以下命令:


flattened_data = {
'rr.bp.0': {0: '0 - 10'},
'rr.bp.1': {0: '10 - 20'},
'rr.bp.2': {0: '20 - 30'},
'rr.al.0': {0: 11.8},
'rr.al.1': {0: 77.2},
'rr.al.2': {0: 98.4}
} # which is the result of pd.json_normalize(sample_obj).to_dict()

df = (
pd.DataFrame(flattened_data)
.pipe(lambda x: x.set_axis(
x.columns.str.split(".", expand=True), axis=1))
.stack([0, 1]).droplevel(0)
.rename_axis(["Header1", "Header2"])
.reset_index()
)

更多回答

Thanks you solution gives me the required output. But I need it to process from flattened dataframe. As I am handling list of arrays also to single level while flattening. Is it possible. Again thanks for the solution

谢谢您的解决方案给了我所需的输出。但我需要它来处理扁平的数据帧。因为我正在处理数组列表,同时也将其展平到单级。有没有可能。再次感谢您的解决方案

You're welcome! I update the answer to address your comment. If not, you need to provide a clear/explicit input that matches with your actual data.

不客气!我更新了答案,以回应您的评论。如果不是,您需要提供与您的实际数据相匹配的明确/显式输入。

your code is working really well. I tried to make it dynamic. Can you check it. [ flatten_copy = df.copy() max_depth = 0 max_depth_list = [] for col in flatten_copy.columns: max_depth_list.append(len(re.findall('\.', col))) max_depth = max(max_depth_list) df = ( pd.DataFrame(df).pipe(lambda x: x.set_axis(x.columns.str.split(".", expand=True), axis=1)) .stack(list(range(max_depth))).droplevel(0).rename_axis([f"Header{i}" for i in range(max_depth)]).reset_index() ) ] Thanks for the Help.

您的代码运行得非常好。我试着让它充满活力。你能查一下吗?[Flatten_Copy=df.Copy()max_Depth=0 For ol in Flatten_Copy.Columns:max_Depth_list.append(len(re.findall(‘\.’,ol)max_Depth=max(Max_Deep_List)df=(pd.DataFrame(Df).tube(lambda x:x.set_axis(x.Columns.str.Split(“.”,Expand=True)),Axis=1)).stack(list(range(max_depth))).droplevel(0).rename_axis([f“Header{i}”for i in Range(Max_Depth)]).Reset_Index()]感谢您的帮助。

As said before, you need to provide a clear input and include it to your question.

如前所述,您需要提供一个明确的输入,并将其包含在您的问题中。

Sorry for inconvenience. Really appreciate your support. Thanks I will update properly. But for now it is working Thanks

很抱歉给您带来不便。非常感谢您的支持。谢谢,我会及时更新的。但就目前而言,它正在发挥作用

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com