gpt4 book ai didi

python - 使用 Pandas 计算 DataFrame 中不同列的值对

转载 作者:行者123 更新时间:2023-12-05 09:32:10 25 4
gpt4 key购买 nike

我有一个像这样的 df:

df = pd.DataFrame([["coffee","soda","coffee","water","soda","soda"],["paper","glass","glass","paper","paper","glass"], list('smlssm')]).T
df.columns = ['item','cup','size']

df:

    item    cup    size
0 coffee paper s
1 soda glass m
2 coffee glass l
3 water paper s
4 soda paper s
5 soda glass m

我想将其转换为如下所示的 df

    item    cup    size  freq
0 coffee paper s 1
1 coffee paper m 0
2 coffee paper l 0
3 coffee glass s 0
4 coffee glass m 0
5 coffee glass l 1
6 soda paper s 1
7 soda paper m 0
8 soda paper l 0
9 soda glass s 0
10 soda glass m 2
11 soda glass l 0
. . . . .
. . . . .
. . . . .

因此,对于每件商品,我都想要一行包含罩杯和尺码的可能组合,另外一行包含频率。

使用 pandas 执行此操作的正确方法是什么?

最佳答案

尝试:

df["freq"] = 1
x = df.pivot_table(
index="item",
columns=["cup", "size"],
values="freq",
aggfunc="sum",
fill_value=0,
)
full_cols = pd.MultiIndex.from_product(
[
x.columns.get_level_values(0).unique(),
x.columns.get_level_values(1).unique(),
],
names=x.columns.names,
)
x = x.reindex(full_cols, fill_value=0, axis=1)
print(x.stack([0, 1]).reset_index(name="freq"))

打印:

      item    cup size  freq
0 coffee glass l 1
1 coffee glass m 0
2 coffee glass s 0
3 coffee paper l 0
4 coffee paper m 0
5 coffee paper s 1
6 soda glass l 0
7 soda glass m 2
8 soda glass s 0
9 soda paper l 0
10 soda paper m 0
11 soda paper s 1
12 water glass l 0
13 water glass m 0
14 water glass s 0
15 water paper l 0
16 water paper m 0
17 water paper s 1

使用的数据框:

     item    cup size
0 coffee paper s
1 soda glass m
2 coffee glass l
3 water paper s
4 soda paper s
5 soda glass m

关于python - 使用 Pandas 计算 DataFrame 中不同列的值对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68385904/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com