I want to have something similar to pos_explode in pandas, i.e. keep the index of the element in the original array.
我想要一些类似于熊猫中的pos_Blade的东西,即保留原始数组中元素的索引。
df = pd.DataFrame({'metric': {24: 53, 68: 93, 86: 38},
'label': {24: 1, 68: 1, 86: 1},
'group_1': {24: 1, 68: 1, 86: 1},
'group_2': {24: 1, 68: 1, 86: 1},
'metric_group_0': {24: np.array([72, 41, 96]),
68: np.array([85, 56, 33]),
86: np.array([26, 85, 26])}})
df = df.reset_index(drop=True)
df = df.reset_index(drop=False)
df = df.set_index(['index'])
display(df)
s=pd.DataFrame({'metric_group_0':np.concatenate(df.metric_group_0.values)},index=df.index.repeat(df.metric_group_0.str.len()))
display(s)
s.join(df.drop('metric_group_0',1),how='left')
This explodes the data but is losing the index. How can I keep the index as an additional column?
I.e. in this example it would be [1,2,3] for each pandas.Index
.
这会爆炸数据,但会丢失索引。如何将索引作为附加列保留?即,在本例中,每个熊猫的索引将是[1,2,3]。
metric label group_1 group_2 metric_group_0
index
0 53 1 1 1 [72, 41, 96]
1 93 1 1 1 [85, 56, 33]
2 38 1 1 1 [26, 85, 26]
is currently converted to:
当前已转换为:
metric_group_0 metric label group_1 group_2
index
0 72 53 1 1 1
0 41 53 1 1 1
0 96 53 1 1 1
1 85 93 1 1 1
1 56 93 1 1 1
1 33 93 1 1 1
2 26 38 1 1 1
2 85 38 1 1 1
2 26 38 1 1 1
but is missing the original index.
The desired output would look like:
但缺少原始索引。所需的输出将如下所示:
metric_group_0 metric label group_1 group_2 pos_in_array
index
0 72 53 1 1 1 1
0 41 53 1 1 1 2
0 96 53 1 1 1 3
1 85 93 1 1 1 1
1 56 93 1 1 1 2
1 33 93 1 1 1 3
2 26 38 1 1 1 1
2 85 38 1 1 1 2
2 26 38 1 1 1 3
更多回答
优秀答案推荐
You can create this column with groupby.cumcount
and we use the index
as groups:
您可以使用groupby.umcount创建此列,并且我们将索引用作组:
df['pos_in_array'] = df.groupby(df.index).cumcount()+1
print(df)
metric_group_0 metric label group_1 group_2 pos_in_array
index
0 72 53 1 1 1 1
0 41 53 1 1 1 2
0 96 53 1 1 1 3
1 85 93 1 1 1 1
1 56 93 1 1 1 2
1 33 93 1 1 1 3
2 26 38 1 1 1 1
2 85 38 1 1 1 2
2 26 38 1 1 1 3
So your whole code would look like the following, since you didn't assign your newly created dataframe into a variable yet:
因此,您的整个代码应该如下所示,因为您还没有将新创建的数据帧赋给变量:
df = df.reset_index(drop=True)
df = df.reset_index(drop=False)
df = df.set_index(['index'])
s=pd.DataFrame({'metric_group_0':np.concatenate(df.metric_group_0.values)},
index=df.index.repeat(df.metric_group_0.str.len()))
df = s.join(df.drop('metric_group_0',1),how='left')
df['pos_in_array'] = df.groupby(df.index).cumcount()+1
Another way to do this is creating the list of position indexes at start before explode.
另一种方法是在分解前在START创建位置索引列表。
df['pos_in_array'] = df['metric_group_0'].apply(lambda x : list(range(1, len(x)+1)))
df.explode(columns = ['metric_group_0', 'pos_in_array'])
print(df)
metric_group_0 metric label group_1 group_2 pos_in_array
index
0 72 53 1 1 1 1
0 41 53 1 1 1 2
0 96 53 1 1 1 3
1 85 93 1 1 1 1
1 56 93 1 1 1 2
1 33 93 1 1 1 3
2 26 38 1 1 1 1
2 85 38 1 1 1 2
2 26 38 1 1 1 3
更多回答
我是一名优秀的程序员,十分优秀!