gpt4 book ai didi

python - Pandas :枚举索引中的重复项

转载 作者:太空宇宙 更新时间:2023-11-03 13:27:09 24 4
gpt4 key购买 nike

假设我有一个发生在不同键上的事件列表。

data = [
{"key": "A", "event": "created"},
{"key": "A", "event": "updated"},
{"key": "A", "event": "updated"},
{"key": "A", "event": "updated"},
{"key": "B", "event": "created"},
{"key": "B", "event": "updated"},
{"key": "B", "event": "updated"},
{"key": "C", "event": "created"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
]

df = pandas.DataFrame(data)

我想先在键上索引我的 DataFrame,然后在枚举上索引。它看起来像是一个简单的拆栈操作,但我找不到正确的操作方法。

我能做的最好的就是

df.set_index("key", append=True).swaplevel(0, 1)

event
key
A 0 created
1 updated
2 updated
3 updated
B 4 created
5 updated
6 updated
C 7 created
8 updated
9 updated
10 updated
11 updated
12 updated

但我期待的是

          event
key
A 0 created
1 updated
2 updated
3 updated
B 0 created
1 updated
2 updated
C 0 created
1 updated
2 updated
3 updated
4 updated
5 updated

我也试过类似的东西

df.groupby("key")["key"].count().apply(range).apply(pandas.Series).stack()

但是没有保留顺序,所以我不能将结果用作索引。而且,对于一个看起来很标准的操作,我觉得有点大材小用了……

有什么想法吗?

最佳答案

groupby + 累积计数

有以下几种方式:

# new version thanks @ScottBoston
df = df.set_index(['key', df.groupby('key').cumcount()])\
.rename_axis(['key','count'])

# original version
df = df.assign(count=df.groupby('key').cumcount())\
.set_index(['key', 'count'])

print(df)

event
key count
A 0 created
1 updated
2 updated
3 updated
B 0 created
1 updated
2 updated
C 0 created
1 updated
2 updated
3 updated
4 updated
5 updated

关于python - Pandas :枚举索引中的重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53328489/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com