gpt4 book ai didi

pandas pos_explode - unnest column of arrays but keep the index(PANDA POS_EXPLODE-取消数组列的嵌套,但保留索引)

转载 作者:bug小助手 更新时间:2023-10-25 13:38:52 25 4
gpt4 key购买 nike



I want to have something similar to pos_explode in pandas, i.e. keep the index of the element in the original array.

我想要一些类似于熊猫中的pos_Blade的东西,即保留原始数组中元素的索引。



df = pd.DataFrame({'metric': {24: 53, 68: 93, 86: 38},
'label': {24: 1, 68: 1, 86: 1},
'group_1': {24: 1, 68: 1, 86: 1},
'group_2': {24: 1, 68: 1, 86: 1},
'metric_group_0': {24: np.array([72, 41, 96]),
68: np.array([85, 56, 33]),
86: np.array([26, 85, 26])}})
df = df.reset_index(drop=True)
df = df.reset_index(drop=False)
df = df.set_index(['index'])
display(df)
s=pd.DataFrame({'metric_group_0':np.concatenate(df.metric_group_0.values)},index=df.index.repeat(df.metric_group_0.str.len()))
display(s)
s.join(df.drop('metric_group_0',1),how='left')


This explodes the data but is losing the index. How can I keep the index as an additional column?
I.e. in this example it would be [1,2,3] for each pandas.Index.

这会爆炸数据,但会丢失索引。如何将索引作为附加列保留?即,在本例中,每个熊猫的索引将是[1,2,3]。



       metric  label  group_1  group_2 metric_group_0
index
0 53 1 1 1 [72, 41, 96]
1 93 1 1 1 [85, 56, 33]
2 38 1 1 1 [26, 85, 26]


is currently converted to:

当前已转换为:



       metric_group_0  metric  label  group_1  group_2
index
0 72 53 1 1 1
0 41 53 1 1 1
0 96 53 1 1 1
1 85 93 1 1 1
1 56 93 1 1 1
1 33 93 1 1 1
2 26 38 1 1 1
2 85 38 1 1 1
2 26 38 1 1 1


but is missing the original index.
The desired output would look like:

但缺少原始索引。所需的输出将如下所示:



       metric_group_0  metric  label  group_1  group_2 pos_in_array
index
0 72 53 1 1 1 1
0 41 53 1 1 1 2
0 96 53 1 1 1 3
1 85 93 1 1 1 1
1 56 93 1 1 1 2
1 33 93 1 1 1 3
2 26 38 1 1 1 1
2 85 38 1 1 1 2
2 26 38 1 1 1 3

更多回答
优秀答案推荐

You can create this column with groupby.cumcount and we use the index as groups:

您可以使用groupby.umcount创建此列,并且我们将索引用作组:



df['pos_in_array'] = df.groupby(df.index).cumcount()+1





print(df)
metric_group_0 metric label group_1 group_2 pos_in_array
index
0 72 53 1 1 1 1
0 41 53 1 1 1 2
0 96 53 1 1 1 3
1 85 93 1 1 1 1
1 56 93 1 1 1 2
1 33 93 1 1 1 3
2 26 38 1 1 1 1
2 85 38 1 1 1 2
2 26 38 1 1 1 3


So your whole code would look like the following, since you didn't assign your newly created dataframe into a variable yet:

因此,您的整个代码应该如下所示,因为您还没有将新创建的数据帧赋给变量:



df = df.reset_index(drop=True)
df = df.reset_index(drop=False)
df = df.set_index(['index'])

s=pd.DataFrame({'metric_group_0':np.concatenate(df.metric_group_0.values)},
index=df.index.repeat(df.metric_group_0.str.len()))

df = s.join(df.drop('metric_group_0',1),how='left')

df['pos_in_array'] = df.groupby(df.index).cumcount()+1


Another way to do this is creating the list of position indexes at start before explode.

另一种方法是在分解前在START创建位置索引列表。


df['pos_in_array'] = df['metric_group_0'].apply(lambda x : list(range(1, len(x)+1)))
df.explode(columns = ['metric_group_0', 'pos_in_array'])



print(df)
metric_group_0 metric label group_1 group_2 pos_in_array
index
0 72 53 1 1 1 1
0 41 53 1 1 1 2
0 96 53 1 1 1 3
1 85 93 1 1 1 1
1 56 93 1 1 1 2
1 33 93 1 1 1 3
2 26 38 1 1 1 1
2 85 38 1 1 1 2
2 26 38 1 1 1 3

更多回答

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com